From owner-freebsd-fs@freebsd.org Sun Aug 23 21:00:26 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2F1939C08A8 for ; Sun, 23 Aug 2015 21:00:26 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0A6BE1D3C for ; Sun, 23 Aug 2015 21:00:26 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t7NL0PLX093886 for ; Sun, 23 Aug 2015 21:00:25 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201508232100.t7NL0PLX093886@kenobi.freebsd.org> From: bugzilla-noreply@FreeBSD.org To: freebsd-fs@FreeBSD.org Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 Date: Sun, 23 Aug 2015 21:00:25 +0000 Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Aug 2015 21:00:26 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- Open | 136470 | [nfs] Cannot mount / in read-only, over NFS Open | 139651 | [nfs] mount(8): read-only remount of NFS volume d Open | 144447 | [zfs] sharenfs fsunshare() & fsshare_main() non f 3 problems total for which you should take action. From owner-freebsd-fs@freebsd.org Sun Aug 23 21:22:21 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E18E9C0E2B for ; Sun, 23 Aug 2015 21:22:21 +0000 (UTC) (envelope-from gibbs@scsiguy.com) Received: from aslan.scsiguy.com (ns1.scsiguy.com [70.89.174.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 62ABDDE8; Sun, 23 Aug 2015 21:22:20 +0000 (UTC) (envelope-from gibbs@scsiguy.com) Received: from [192.168.0.68] ([192.168.0.68]) (authenticated bits=0) by aslan.scsiguy.com (8.15.2/8.15.2) with ESMTPSA id t7NLM6Qi086121 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 23 Aug 2015 15:22:06 -0600 (MDT) (envelope-from gibbs@scsiguy.com) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: solaris assert: avl_is_empty(&dn -> dn_dbufs) panic From: "Justin T. Gibbs" In-Reply-To: <55D7700B.3080207@delphij.net> Date: Sun, 23 Aug 2015 15:22:07 -0600 Cc: Don Lewis , freebsd-fs@FreeBSD.org, "Justin T. Gibbs" , George Wilson Content-Transfer-Encoding: quoted-printable Message-Id: References: <201508211748.t7LHmo96096088@gw.catspoiler.org> <55D7700B.3080207@delphij.net> To: d@delphij.net X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Aug 2015 21:22:21 -0000 Hi, I'll need a little time to fully reload the context for these changes. = However, reintroducing a blocking loop is not the right fix - it was a = hack in the original code. :-) My hunch is that removing the assert is = safe, but it would be nice to have a core dump to better understand why = the list isn't empty. -- Justin > On Aug 21, 2015, at 12:38 PM, Xin Li wrote: >=20 > Hi, >=20 > A quick glance at the changes suggests that Justin's changeset may be > related. The reasoning is here: >=20 > https://reviews.csiden.org/r/131/ >=20 > Related Illumos ticket: >=20 > https://www.illumos.org/issues/5056 >=20 > In dnode_evict_dbufs(), remove multiple passes over dn->dn_dbufs. > This is possible now that objset eviction is asynchronously > completed in a different context once dbuf eviction completes. >=20 > In the case of objset eviction, any dbufs held by children will > be evicted via dbuf_rele_and_unlock() once their refcounts go > to zero. Even when objset eviction is not active, the ordering > of the avl tree guarantees that children will be released before > parents, allowing the parent's refcounts to naturally drop to > zero before they are inspected in this single loop. >=20 > =3D=3D=3D=3D >=20 > So, upon return from dnode_evict_dbufs(), there could be some > DB_EVICTING buffers on the AVL pending release and thus breaks the > invariant. >=20 > Should we restore the loop where we yield briefly with the lock > released, then reacquire and recheck? >=20 > Cheers, >=20 > On 08/21/15 10:48, Don Lewis wrote: >> On 21 Aug, Don Lewis wrote: >>> On 21 Aug, Don Lewis wrote: >>>> I just started getting this panic: >>>>=20 >>>> solaris assert: avl_is_empty(&dn -> dn_dbufs), file: >>>> = /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c, >>>> line 495 >>>>=20 >>>> System info: >>>> FreeBSD zipper.catspoiler.org 11.0-CURRENT FreeBSD 11.0-CURRENT #25 = r286923: Wed Aug 19 09:28:53 PDT 2015 = dl@zipper.catspoiler.org:/usr/obj/usr/src/sys/GENERIC amd64 >>>>=20 >>>> My zfs pool has one mirrored vdev. Scrub doesn't find any = problems. >>>>=20 >>>> %zpool status >>>> pool: zroot >>>> state: ONLINE >>>> scan: scrub repaired 0 in 2h58m with 0 errors on Fri Aug 21 = 00:44:52 2015 >>>> config: >>>>=20 >>>> NAME STATE READ WRITE CKSUM >>>> zroot ONLINE 0 0 0 >>>> mirror-0 ONLINE 0 0 0 >>>> ada0p3 ONLINE 0 0 0 >>>> ada1p3 ONLINE 0 0 0 >>>>=20 >>>> This panic is reproduceable and happens every time I use poudriere = to >>>> build ports using my 9.3-RELEASE amd64 jail and occurs at the end = of the >>>> poudriere run when it is unmounting filesystems. >>>>=20 >>>> [00:10:43] =3D=3D=3D=3D>> Stopping 4 builders >>>> 93amd64-default-job-01: removed >>>> 93amd64-default-job-01-n: removed >>>> 93amd64-default-job-02: removed >>>> 93amd64-default-job-02-n: removed >>>> 93amd64-default-job-03: removed >>>> 93amd64-default-job-03-n: removed >>>> 93amd64-default-job-04: removed >>>> 93amd64-default-job-04-n: removed >>>> [00:10:46] =3D=3D=3D=3D>> Creating pkgng repository >>>> Creating repository in /tmp/packages: 100% >>>> Packing files for repository: 100% >>>> [00:10:55] =3D=3D=3D=3D>> Committing packages to repository >>>> [00:10:55] =3D=3D=3D=3D>> Removing old packages >>>> [00:10:55] =3D=3D=3D=3D>> Built ports: devel/py-pymtbl net/sie-nmsg = net/p5-Net-Nmsg net/axa >>>> [93amd64-default] [2015-08-21_00h47m41s] [committing:] Queued: 4 = Built: 4 Failed: 0 Skipped: 0 Ignored: 0 Tobuild: 0 Time: 00:10:53 >>>> [00:10:55] =3D=3D=3D=3D>> Logs: = /var/poudriere/data/logs/bulk/93amd64-default/2015-08-21_00h47m41s >>>> [00:10:55] =3D=3D=3D=3D>> Cleaning up >>>> 93amd64-default: removed >>>> 93amd64-default-n: removed >>>> [00:10:55] =3D=3D=3D=3D>> Umounting file systems >>>> Write failed: Broken pipe >>>>=20 >>>> Prior to that, I ran poudriere a number of times with a 10.2-STABLE >>>> amd64 jail without incident. >>>>=20 >>>> I've kicked off a bunch of poudriere runs for other jails and >>>> will check on it in the morning. >>>=20 >>> Died the same way after building ports on the first jail, >>> 10.1-RELEASE amd64. >>>=20 >>> Since there have been some zfs commits since r286923, I upgraded to >>> r286998 this morning and tried again with no better luck. I got the >>> same panic again. >>>=20 >>> This machine has mirrored swap, and even though I've done what >>> gmirror(8) says to do in order to capture crash dumps, I've had no = luck >>> with that. The dump is getting written, but savecore is unable to = find >>> it. >>=20 >> Not sure what is happening with savecore during boot, but I was able = to >> run it manually and collect the crash dump. >>=20 >>=20 >> Unread portion of the kernel message buffer: >> panic: solaris assert: avl_is_empty(&dn->dn_dbufs), file: = /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c, = line: 495 >> cpuid =3D 1 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame = 0xfffffe0859e4e4e0 >> vpanic() at vpanic+0x189/frame 0xfffffe0859e4e560 >> panic() at panic+0x43/frame 0xfffffe0859e4e5c0 >> assfail() at assfail+0x1a/frame 0xfffffe0859e4e5d0 >> dnode_sync() at dnode_sync+0x6c8/frame 0xfffffe0859e4e6b0 >> dmu_objset_sync_dnodes() at dmu_objset_sync_dnodes+0x2b/frame = 0xfffffe0859e4e6e0 >> dmu_objset_sync() at dmu_objset_sync+0x29e/frame 0xfffffe0859e4e7b0 >> dsl_pool_sync() at dsl_pool_sync+0x348/frame 0xfffffe0859e4e820 >> spa_sync() at spa_sync+0x442/frame 0xfffffe0859e4e910 >> txg_sync_thread() at txg_sync_thread+0x23d/frame 0xfffffe0859e4e9f0 >> fork_exit() at fork_exit+0x84/frame 0xfffffe0859e4ea30 >> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0859e4ea30 >> --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- >> KDB: enter: panic >>=20 >>=20 >>=20 >> (kgdb) bt >> #0 doadump (textdump=3D0) at pcpu.h:221 >> #1 0xffffffff8037bb86 in db_fncall (dummy1=3D,=20= >> dummy2=3D, dummy3=3D,=20 >> dummy4=3D) at = /usr/src/sys/ddb/db_command.c:568 >> #2 0xffffffff8037b941 in db_command (cmd_table=3D0x0) >> at /usr/src/sys/ddb/db_command.c:440 >> #3 0xffffffff8037b5d4 in db_command_loop () >> at /usr/src/sys/ddb/db_command.c:493 >> #4 0xffffffff8037e18b in db_trap (type=3D, = code=3D0) >> at /usr/src/sys/ddb/db_main.c:251 >> #5 0xffffffff80a5b294 in kdb_trap (type=3D3, code=3D0, tf=3D) >> at /usr/src/sys/kern/subr_kdb.c:654 >> #6 0xffffffff80e6a4b1 in trap (frame=3D0xfffffe0859e4e410) >> at /usr/src/sys/amd64/amd64/trap.c:540 >> #7 0xffffffff80e49f22 in calltrap () >> at /usr/src/sys/amd64/amd64/exception.S:235 >> #8 0xffffffff80a5a96e in kdb_enter (why=3D0xffffffff81379010 = "panic",=20 >> msg=3D0xffffffff80a60b60 = "UH\211\ufffdAWAVATSH\203\ufffdPI\211\ufffdA\211\ufffdH\213\004%Py\ufffd\2= 01H\211E\ufffd\201<%\ufffd\210\ufffd\201") at cpufunc.h:63 >> #9 0xffffffff80a1e2c9 in vpanic (fmt=3D,=20 >> ap=3D) at = /usr/src/sys/kern/kern_shutdown.c:619 >> #10 0xffffffff80a1e333 in panic (fmt=3D0xffffffff81aafa90 "\004") >> at /usr/src/sys/kern/kern_shutdown.c:557 >> ---Type to continue, or q to quit--- >> #11 0xffffffff8240922a in assfail (a=3D,=20 >> f=3D, l=3D) >> at = /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81 >> #12 0xffffffff820d4f78 in dnode_sync (dn=3D0xfffff8040b72d3d0,=20 >> tx=3D0xfffff8001598ec00) >> at = /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c:495 >> #13 0xffffffff820c922b in dmu_objset_sync_dnodes = (list=3D0xfffff80007712b90,=20 >> newlist=3D, tx=3D) >> at = /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:1045 >> #14 0xffffffff820c8ede in dmu_objset_sync (os=3D0xfffff80007712800,=20= >> pio=3D, tx=3D0xfffff8001598ec00) >> at = /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:1163 >> #15 0xffffffff820e8e78 in dsl_pool_sync (dp=3D0xfffff80015676000, = txg=3D2660975) >> at = /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:536 >> #16 0xffffffff8210dca2 in spa_sync (spa=3D0xfffffe00089c6000, = txg=3D2660975) >> at = /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:6641 >> #17 0xffffffff8211843d in txg_sync_thread (arg=3D0xfffff80015676000) >> at = /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:517 >> #18 0xffffffff809e47c4 in fork_exit ( >> callout=3D0xffffffff82118200 , = arg=3D0xfffff80015676000,=20 >> frame=3D0xfffffe0859e4ea40) at /usr/src/sys/kern/kern_fork.c:1006 >> ---Type to continue, or q to quit--- >> #19 0xffffffff80e4a45e in fork_trampoline () >> at /usr/src/sys/amd64/amd64/exception.S:610 >> #20 0x0000000000000000 in ?? () >> Current language: auto; currently minimal >>=20 >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>=20 >=20 >=20 > --=20 > Xin LI https://www.delphij.net/ > FreeBSD - The Power to Serve! Live free or die >=20 From owner-freebsd-fs@freebsd.org Mon Aug 24 08:11:34 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A72DC9BF2EA for ; Mon, 24 Aug 2015 08:11:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8D184137F for ; Mon, 24 Aug 2015 08:11:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t7O8BYqb068121 for ; Mon, 24 Aug 2015 08:11:34 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 195746] zfs L2ARC wrong alloc/free size Date: Mon, 24 Aug 2015 08:11:32 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: commit-hook@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Aug 2015 08:11:34 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195746 --- Comment #3 from commit-hook@freebsd.org --- A commit references this bug: Author: avg Date: Mon Aug 24 08:10:53 UTC 2015 New revision: 287099 URL: https://svnweb.freebsd.org/changeset/base/287099 Log: account for ashift when gathering buffers to be written to l2arc device The change that introduced the L2ARC compression support also introduced a bug where the on-disk size of the selected buffers could end up larger than the target size if the ashift is greater than 9. This was because the buffer selection could did not take into account the fact that on-disk size could be larger than the in-memory buffer size due to the alignment requirements. At the moment b_asize is a misnomer as it does not always represent the allocated size: if a buffer is compressed, then the compressed size is properly rounded (on FreeBSD), but if the compression fails or it is not applied, then the original size is kept and it could be smaller than what ashift requires. For the same reasons arcstat_l2_asize and the reported used space on the cache device could be smaller than the actual allocated size if ashift > 9. That problem is not fixed by this change. This change only ensures that l2ad_hand is not advanced by more than target_sz. Otherwise we would overwrite active (unevicted) L2ARC buffers. That problem is manifested as growing l2_cksum_bad and l2_io_error counters. This change also changes 'p' prefix to 'a' prefix in a few places where variables represent allocated rather than physical size. The resolved problem could also result in the reported allocated size being greater than the cache device's capacity, because of the overwritten buffers (more than one buffer claiming the same disk space). This change is already in ZFS-on-Linux: zfsonlinux/zfs@ef56b0780c80ebb0b1e637b8b8c79530a8ab3201 PR: 198242 PR: 195746 (possibly related) Reviewed by: mahrens (https://reviews.csiden.org/r/229/) Tested by: gkontos@aicom.gr (most recently) MFC after: 15 days X-MFC note: patch does not apply as is at the moment Relnotes: yes Sponsored by: ClusterHQ Differential Revision: https://reviews.freebsd.org/D2764 Reviewed by: noone (@FreeBSD.org) Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Mon Aug 24 08:18:13 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 14A6B9BF5B5 for ; Mon, 24 Aug 2015 08:18:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 01542182B for ; Mon, 24 Aug 2015 08:18:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t7O8ICwo074992 for ; Mon, 24 Aug 2015 08:18:12 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 195746] zfs L2ARC wrong alloc/free size Date: Mon, 24 Aug 2015 08:18:12 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: avg@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Aug 2015 08:18:13 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195746 Andriy Gapon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-fs@FreeBSD.org |avg@FreeBSD.org Status|New |In Progress -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Mon Aug 24 23:47:12 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3E6789C17E4 for ; Mon, 24 Aug 2015 23:47:12 +0000 (UTC) (envelope-from jason.unovitch@gmail.com) Received: from mail-qk0-x22e.google.com (mail-qk0-x22e.google.com [IPv6:2607:f8b0:400d:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EE4081BE5 for ; Mon, 24 Aug 2015 23:47:11 +0000 (UTC) (envelope-from jason.unovitch@gmail.com) Received: by qkfh127 with SMTP id h127so91699220qkf.1 for ; Mon, 24 Aug 2015 16:47:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=h/pxgzqdNcHl3z4cea1YemR8zL9MnowaRFE3iE9iVmI=; b=vbchplTiPC8KhvXyp/pLX/M6irDg/29DH3UeSrnoVjFcSWtWZ5fgh2ATa4N4sKE8a0 qpMPv5itE6DsG2vW9uY5bqs+zaeahIMp6hGQghOALFbXoX6may0o1NogoOqn8dqKLDCu yELaUyfxTPIcmCATln2ZWbPVSJ8Q1s8xbL9XOu8T9tz64oFCtH7uV1Pem1qmzdSmr9YY LMEyY2upIYiwolZbRctGkVij94OeRm2iLS/CqwpvUpQbltuyNE+k1ErzwCjRWdgOcUYJ WsW4obD3cw8iY/yBkBVXlW+vMFFAuW/4byKKrEDLF1NnOOB0xcnZ2QEPqI8eydbUX7qz 4asA== X-Received: by 10.55.42.65 with SMTP id q62mr36525240qkh.12.1440460031054; Mon, 24 Aug 2015 16:47:11 -0700 (PDT) Received: from Silverstone.nc-us.unovitch.com ([2606:a000:5687:de02:be5f:f4ff:fe5d:f28]) by smtp.gmail.com with ESMTPSA id t105sm12413768qgd.5.2015.08.24.16.47.10 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Aug 2015 16:47:10 -0700 (PDT) Date: Mon, 24 Aug 2015 19:47:08 -0400 From: Jason Unovitch To: freebsd-fs@freebsd.org Subject: Re: solaris assert: avl_is_empty(&dn -> dn_dbufs) panic Message-ID: <20150824234708.GA9687@Silverstone.nc-us.unovitch.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Aug 2015 23:47:12 -0000 For reference I opened https://bugs.FreeBSD.org/202607 before I came across this discussion. > Hi, > > I'll need a little time to fully reload the context for these changes. However, reintroducing a blocking loop is not the right fix - it was a hack in the original code. :-) My hunch is that removing the assert is safe, but it would be nice to have a core dump to better understand why the list isn't empty. > > -- > Justin > Justin, I have the contents of my /var/crash available. I also have a beadm boot environment of a known bad (r287028) as well as the known good (r286204) that I am currently running on. I should be able to replicate this as needed and provide some assistance. > > On Aug 21, 2015, at 12:38 PM, Xin Li wrote: > > > > Hi, > > > > A quick glance at the changes suggests that Justin's changeset may be > > related. The reasoning is here: > > > > https://reviews.csiden.org/r/131/ > > > > Related Illumos ticket: > > > > https://www.illumos.org/issues/5056 > > > > In dnode_evict_dbufs(), remove multiple passes over dn->dn_dbufs. > > This is possible now that objset eviction is asynchronously > > completed in a different context once dbuf eviction completes. > > > > In the case of objset eviction, any dbufs held by children will > > be evicted via dbuf_rele_and_unlock() once their refcounts go > > to zero. Even when objset eviction is not active, the ordering > > of the avl tree guarantees that children will be released before > > parents, allowing the parent's refcounts to naturally drop to > > zero before they are inspected in this single loop. > > > > ==== > > > > So, upon return from dnode_evict_dbufs(), there could be some > > DB_EVICTING buffers on the AVL pending release and thus breaks the > > invariant. > > > > Should we restore the loop where we yield briefly with the lock > > released, then reacquire and recheck? > > > > Cheers, Jason From owner-freebsd-fs@freebsd.org Tue Aug 25 15:30:08 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E2DA199A3E6 for ; Tue, 25 Aug 2015 15:30:07 +0000 (UTC) (envelope-from s.tyshchenko@identika.pro) Received: from scale212.ru (scale212.ru [51.254.36.76]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6B576C37 for ; Tue, 25 Aug 2015 15:30:07 +0000 (UTC) (envelope-from s.tyshchenko@identika.pro) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=scale212.ru; s=default; h=Content-Type:List-Unsubscribe:Message-ID:Sender:From:Date:MIME-Version:Subject:To; bh=mt+4PvK9d+l2yyeDFZEwai1RraAAy6NYGm1lSKBMlCs=; b=lWIIy2JdJJPchJ4+kpVYKyhyHh3gLvqj7XqV3qUCi+XJcFh+lmqiuFxlpjPylujs/ur5FrZXjhpqULxi/eds5qAkJcKKAgjS7OY3YUJA3sGaRNh+vi9TF6hFtkEwk15xmfqrWS9IAok99Tv8XXcHfrXPVKzA6b9C/6UvkLPHXJo=; Received: from root by scale212.ru with local (Exim 4.80) (envelope-from ) id 1ZUGAn-0006CT-Hw for freebsd-fs@freebsd.org; Tue, 25 Aug 2015 17:30:05 +0200 To: freebsd-fs@freebsd.org Subject: For you MIME-Version: 1.0 Date: Tue, 25 Aug 2015 17:30:05 +0200 From: Sergey Tyshchenko Sender: s.tyshchenko@identika.pro Message-ID: <241796164.26531@scale212.ru> X-Priority: 3 X-Mailer: scale212.ru mailer. Ver. 1.1. Precedence: bulk Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: base64 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Aug 2015 15:30:08 -0000 Zm9yIHlvdQ0KCQkJDQoNCgkJCQ0KCQkJwqANCg0KCQkJwqANCg0KCQkJwqANCg0KCQkJwqANCg0K CQkJDQoJCQ0KCQkNCgkJCQ0KCQkJDQoJCQkNCgkJCUhlbGxvLCBNeSBuYW1lIGlzIFNlcmdleSwg SSBwcm9wb3NlIHlvdSBjb29wZXJhdGlvbi4gV2UgYXJlIGEgY3JlYXRpdmUgY29tcGFueSBpbiB0 aGUgZGV2ZWxvcG1lbnQgYW5kIGNyZWF0aW9uIG9mIHVuaXF1ZSBwcm9kdWN0cyBmb3IgZGVjb3Jh dGlvbiAtIHRvIGRlc2lnbiBidWlsZGluZ3MgSURFTlRJS0EuUFJPLiBXZScncmUgc3BlY2lhbGl6 ZWQgb24gZGVjb3JhdGlvbiBvZiBzaG9wcywgcGV0cm9sIHN0YXRpb24sIGNhZmUsIGZhc3QgZm9v ZCByZXN0YXVyYW50cywgYW5kIHJldGFpbCBmcmFuY2hpc2VzLldlIGFyZSBpbnRlcmVzdGVkIGF0 IGxvbmctdGVybSBjb29wZXJhdGlvbiB3aXRoIHlvdS4gV2UgcHJvcG9zZSBhIGdvb2QgZW52aXJv bm1lbnQgdG8gd29yayB3aXRoIHBhcnRuZXJzLkV4YW1wbGVzIG9mIG91ciB3b3JrIGh0dHAgOmh0 dHA6Ly9pZGVudGlrYS5wcm8vY291bnRlcl9saW5rL2NvdW50ZXIucGhwP2NsaWNrPXByZXNlbnRh dGlvbl9lbkkgcHJvcG9zZSBZb3UgdG8gYmVjb21lIG91ciBwYXJ0bmVyIGluIHlvdXIgYXJlYS5M ZXQgdXMga25vdyBhYm91dCB5b3VyIGRlY2lzaW9uDQoJCQkNCg0KCQkJDQoJCQnCoA0KDQoJCQnC oA0KDQoJCQnCoA0KDQoJCQnCoA0KDQoJCQkNCgkJDQoJCQ0KCQkJDQoJCQkNCgkJCQ0KCQkJDQoJ CQkNCg0KCQkJDQoJCQnCoA0KDQoJCQnCoA0KDQoJCQnCoA0KDQoJCQnCoA0KDQoJCQkNCgkJDQoJ CQ0KCQkJDQoJCQkNCgkJCQ0KCQkJDQoJCQkNCg0KCQkJDQoJCQnCoA0KDQoJCQnCoA0KDQoJCQnC oA0KDQoJCQnCoA0KDQoJCQkNCgkJDQoJCQ0KCQkJDQoJCQkNCgkJCQ0KCQkJIEV4YW1wbGVzIG9m IG91ciB3b3JrIGh0dHAgOmh0dHA6Ly9pZGVudGlrYS5wcm8vY291bnRlcl9saW5rL2NvdW50ZXIu cGhwP2NsaWNrPXByZXNlbnRhdGlvbl9lbg0KCQkJDQoNCgkJCQ0KCQkJwqANCg0KCQkJwqANCg0K CQkJwqANCg0KCQkJwqANCg0KCQkJDQoJCQ0KCQkNCgkJCQ0KCQkJDQoJCQkNCgkJCVNlcmdleSBU eXNoY2hlbmtvQ0VPIHwgSURFTlRJS0EuUFJPVmliZXI6ICszODA1MDU1NjY5NjUgfCBXaGF0c0Fw cDogKzM4MDUwNTU2Njk2NVNreXBlOiB0LnNlcmdleS5tcy50eXNoY2hlbmtvQGlkZW50aWthLnBy byB8IHd3dy5pZGVudGlrYS5wcm8wMzA0MCB8IEdvbG9zaWl2c2t5aSBBdmUuIDcwIHwgb2ZmaWNl IDUwMiB8IEtpZXYgDQoJCQkNCg0KCQkJDQoJCQnCoA0KDQoJCQnCoA0KDQoJCQnCoA0KDQoJCQnC oA== From owner-freebsd-fs@freebsd.org Tue Aug 25 20:18:46 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 452779C38F6 for ; Tue, 25 Aug 2015 20:18:46 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 2F178157F; Tue, 25 Aug 2015 20:18:46 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (localhost [IPv6:::1]) by freefall.freebsd.org (Postfix) with ESMTP id 27A3411C1; Tue, 25 Aug 2015 20:18:46 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (localhost [172.31.3.2]) by mail.xzibition.com (Postfix) with ESMTP id D511D796D; Tue, 25 Aug 2015 20:18:45 +0000 (UTC) X-Virus-Scanned: amavisd-new at mail.xzibition.com Received: from mail.xzibition.com ([172.31.3.2]) by mail.xzibition.com (mail.xzibition.com [172.31.3.2]) (amavisd-new, port 10026) with LMTP id 7f8sipBLqhAj; Tue, 25 Aug 2015 20:18:43 +0000 (UTC) To: "freebsd-fs@freebsd.org" DKIM-Filter: OpenDKIM Filter v2.9.2 mail.xzibition.com 7DC987966 Cc: Andriy Gapon , Alexander Motin From: Bryan Drewery Subject: l2arc_feed_thread page fault on r287138 Openpgp: id=F9173CB2C3AAEA7A5C8A1F0935D771BB6E4697CF; url=http://www.shatow.net/bryan/bryan2.asc Organization: FreeBSD Message-ID: <55DCCDA3.8030605@FreeBSD.org> Date: Tue, 25 Aug 2015 13:18:43 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="eQ5U1jlDW8NXiXre4q6m6nd72ttObDrmA" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Aug 2015 20:18:46 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --eQ5U1jlDW8NXiXre4q6m6nd72ttObDrmA Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > Fatal trap 12: page fault while in kernel mode > cpuid =3D 2; apic id =3D 02 > fault virtual address =3D 0xc4 > fault code =3D supervisor read data, page not present > instruction pointer =3D 0x20:0xffffffff80ce6c63 > stack pointer =3D 0x28:0xfffffe3553e689d0 > frame pointer =3D 0x28:0xfffffe3553e68a00 > code segment =3D base 0x0, limit 0xfffff, type 0x1b > =3D DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > current process =3D 6 (l2arc_feed_thread) > [ thread pid 6 tid 100172 ] > Stopped at uma_dbg_free+0x53: movl 0xc4(%rdi),%esi > db> bt > Tracing pid 6 tid 100172 td 0xfffff8012551c9a0 > uma_dbg_free() at uma_dbg_free+0x53/frame 0xfffffe3553e68a00 > uma_zfree_arg() at uma_zfree_arg+0xaf/frame 0xfffffe3553e68a60 > l2arc_feed_thread() at l2arc_feed_thread+0xe0b/frame 0xfffffe3553e68bb0= > fork_exit() at fork_exit+0x84/frame 0xfffffe3553e68bf0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe3553e68bf0 > --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- > db> --=20 Regards, Bryan Drewery --eQ5U1jlDW8NXiXre4q6m6nd72ttObDrmA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJV3M2jAAoJEDXXcbtuRpfPeVMIAJCueOczN6fGp/zvyXVHQ2Fe 7kH5Wngreug0890WVo6pAV0zkg/z/AA8J7oiu33K9twKMhC0GgfRuI20HrygFcj6 ai/Im2smOTxxTI3dDox0cE4JXkPsVzcF7kFCiqCayoIS4XPtnP8A3h9V7c+CllMG oq+DGe9vjAo5H90iQcy++zBuy36QUMs1yYE4t9hFzt4QWwkD0vDbJ6OVokTLHdiA TPfxMw13mkoaIx8qhmivluKFWHdH0dGg/sn8knjaTjIsvleaB0tQaVvG5o6mwQ9O 2a7j0G7o6cV4DPi8X/f960Yllged3X/5H0sUTwA2cDinjvWSbmaH0RrI8bOpEHA= =n3Ky -----END PGP SIGNATURE----- --eQ5U1jlDW8NXiXre4q6m6nd72ttObDrmA-- From owner-freebsd-fs@freebsd.org Tue Aug 25 20:21:28 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3E1169C39E4 for ; Tue, 25 Aug 2015 20:21:28 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 10657188C; Tue, 25 Aug 2015 20:21:26 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA10043; Tue, 25 Aug 2015 23:21:25 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZUKii-000KfC-Av; Tue, 25 Aug 2015 23:21:24 +0300 Subject: Re: l2arc_feed_thread page fault on r287138 To: Bryan Drewery , "freebsd-fs@freebsd.org" References: <55DCCDA3.8030605@FreeBSD.org> Cc: Alexander Motin From: Andriy Gapon Message-ID: <55DCCE0D.8050902@FreeBSD.org> Date: Tue, 25 Aug 2015 23:20:29 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <55DCCDA3.8030605@FreeBSD.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Aug 2015 20:21:28 -0000 On 25/08/2015 23:18, Bryan Drewery wrote: >> Fatal trap 12: page fault while in kernel mode >> cpuid = 2; apic id = 02 >> fault virtual address = 0xc4 >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0xffffffff80ce6c63 >> stack pointer = 0x28:0xfffffe3553e689d0 >> frame pointer = 0x28:0xfffffe3553e68a00 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 6 (l2arc_feed_thread) >> [ thread pid 6 tid 100172 ] >> Stopped at uma_dbg_free+0x53: movl 0xc4(%rdi),%esi >> db> bt >> Tracing pid 6 tid 100172 td 0xfffff8012551c9a0 >> uma_dbg_free() at uma_dbg_free+0x53/frame 0xfffffe3553e68a00 >> uma_zfree_arg() at uma_zfree_arg+0xaf/frame 0xfffffe3553e68a60 >> l2arc_feed_thread() at l2arc_feed_thread+0xe0b/frame 0xfffffe3553e68bb0 >> fork_exit() at fork_exit+0x84/frame 0xfffffe3553e68bf0 >> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe3553e68bf0 >> --- trap 0, rip = 0, rsp = 0, rbp = 0 --- >> db> > A little bit more debug info like line numbers, argument values, etc would be nice. -- Andriy Gapon From owner-freebsd-fs@freebsd.org Tue Aug 25 20:22:04 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A94389C3AE7 for ; Tue, 25 Aug 2015 20:22:04 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 920A01A22; Tue, 25 Aug 2015 20:22:04 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (localhost [IPv6:::1]) by freefall.freebsd.org (Postfix) with ESMTP id 8B30C1292; Tue, 25 Aug 2015 20:22:04 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (localhost [172.31.3.2]) by mail.xzibition.com (Postfix) with ESMTP id 4C09E799C; Tue, 25 Aug 2015 20:22:04 +0000 (UTC) X-Virus-Scanned: amavisd-new at mail.xzibition.com Received: from mail.xzibition.com ([172.31.3.2]) by mail.xzibition.com (mail.xzibition.com [172.31.3.2]) (amavisd-new, port 10026) with LMTP id FWwN9Qhp0C9F; Tue, 25 Aug 2015 20:22:02 +0000 (UTC) Subject: Re: l2arc_feed_thread page fault on r287138 DKIM-Filter: OpenDKIM Filter v2.9.2 mail.xzibition.com E9E757997 To: Andriy Gapon , "freebsd-fs@freebsd.org" References: <55DCCDA3.8030605@FreeBSD.org> <55DCCE0D.8050902@FreeBSD.org> Cc: Alexander Motin From: Bryan Drewery Openpgp: id=F9173CB2C3AAEA7A5C8A1F0935D771BB6E4697CF; url=http://www.shatow.net/bryan/bryan2.asc Organization: FreeBSD Message-ID: <55DCCE6B.5010906@FreeBSD.org> Date: Tue, 25 Aug 2015 13:22:03 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <55DCCE0D.8050902@FreeBSD.org> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1Mo0UQoMRf89W4Uf0RgjsVi2LJakixIK6" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Aug 2015 20:22:04 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --1Mo0UQoMRf89W4Uf0RgjsVi2LJakixIK6 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 8/25/2015 1:20 PM, Andriy Gapon wrote: > On 25/08/2015 23:18, Bryan Drewery wrote: >>> Fatal trap 12: page fault while in kernel mode >>> cpuid =3D 2; apic id =3D 02 >>> fault virtual address =3D 0xc4 >>> fault code =3D supervisor read data, page not present >>> instruction pointer =3D 0x20:0xffffffff80ce6c63 >>> stack pointer =3D 0x28:0xfffffe3553e689d0 >>> frame pointer =3D 0x28:0xfffffe3553e68a00 >>> code segment =3D base 0x0, limit 0xfffff, type 0x1b >>> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >>> processor eflags =3D interrupt enabled, resume, IOPL =3D 0 >>> current process =3D 6 (l2arc_feed_thread) >>> [ thread pid 6 tid 100172 ] >>> Stopped at uma_dbg_free+0x53: movl 0xc4(%rdi),%esi >>> db> bt >>> Tracing pid 6 tid 100172 td 0xfffff8012551c9a0 >>> uma_dbg_free() at uma_dbg_free+0x53/frame 0xfffffe3553e68a00 >>> uma_zfree_arg() at uma_zfree_arg+0xaf/frame 0xfffffe3553e68a60 >>> l2arc_feed_thread() at l2arc_feed_thread+0xe0b/frame 0xfffffe3553e68b= b0 >>> fork_exit() at fork_exit+0x84/frame 0xfffffe3553e68bf0 >>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe3553e68bf0 >>> --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- >>> db> >> >=20 > A little bit more debug info like line numbers, argument values, etc > would be nice. >=20 Sure, once I can actually get on the system. It panics quite quickly. --=20 Regards, Bryan Drewery --1Mo0UQoMRf89W4Uf0RgjsVi2LJakixIK6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJV3M5rAAoJEDXXcbtuRpfPvkgIAMI+auE5KY41FT8DId1n6Xp7 ISHrTnVSTBXPgIzXGOTnmgIQdz2eelY7gd8MvgNAgnVwfx0chldhGSwJkxvmxJq4 PsfaYuGLRX+n8VGeQQW16FLcDDv0H3USg30/UhG6j3obsfkCKbme2ss5+omgSWv1 P1MHZQMV21viRHtM6xy+zkwzaiy50Q3a6RA0N+cXtjmvnEq4h75MIAvQDN8FQaX9 +1VeUaYcLL4Nu24LjNAf4bPG78wAwFECtD6TA5/ALxTPucKUt6voteM6uE5ZFi2v vOOMQYgV5PGkX2QmqnpuyecmqjVDUu740bVoVLBtfCIN4oTThZfmlDseDV34w4A= =YWFr -----END PGP SIGNATURE----- --1Mo0UQoMRf89W4Uf0RgjsVi2LJakixIK6-- From owner-freebsd-fs@freebsd.org Tue Aug 25 20:27:47 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F00899C3CAE for ; Tue, 25 Aug 2015 20:27:47 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id D8D8C1BD3; Tue, 25 Aug 2015 20:27:47 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (localhost [IPv6:::1]) by freefall.freebsd.org (Postfix) with ESMTP id CCA311375; Tue, 25 Aug 2015 20:27:47 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (localhost [172.31.3.2]) by mail.xzibition.com (Postfix) with ESMTP id 8967C79BA; Tue, 25 Aug 2015 20:27:47 +0000 (UTC) X-Virus-Scanned: amavisd-new at mail.xzibition.com Received: from mail.xzibition.com ([172.31.3.2]) by mail.xzibition.com (mail.xzibition.com [172.31.3.2]) (amavisd-new, port 10026) with LMTP id uM8COdT6AFvL; Tue, 25 Aug 2015 20:27:44 +0000 (UTC) Subject: Re: l2arc_feed_thread page fault on r287138 DKIM-Filter: OpenDKIM Filter v2.9.2 mail.xzibition.com 83ED179B5 To: "freebsd-fs@freebsd.org" References: <55DCCDA3.8030605@FreeBSD.org> Cc: Andriy Gapon , Alexander Motin From: Bryan Drewery Openpgp: id=F9173CB2C3AAEA7A5C8A1F0935D771BB6E4697CF; url=http://www.shatow.net/bryan/bryan2.asc Organization: FreeBSD Message-ID: <55DCCFC0.3080307@FreeBSD.org> Date: Tue, 25 Aug 2015 13:27:44 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <55DCCDA3.8030605@FreeBSD.org> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="lhUlwXwEe6V91RlxhnP7dlf4OpuvrLn8Q" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Aug 2015 20:27:48 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --lhUlwXwEe6V91RlxhnP7dlf4OpuvrLn8Q Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 8/25/2015 1:18 PM, Bryan Drewery wrote: >> Fatal trap 12: page fault while in kernel mode >> cpuid =3D 2; apic id =3D 02 >> fault virtual address =3D 0xc4 >> fault code =3D supervisor read data, page not present >> instruction pointer =3D 0x20:0xffffffff80ce6c63 >> stack pointer =3D 0x28:0xfffffe3553e689d0 >> frame pointer =3D 0x28:0xfffffe3553e68a00 >> code segment =3D base 0x0, limit 0xfffff, type 0x1b >> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags =3D interrupt enabled, resume, IOPL =3D 0 >> current process =3D 6 (l2arc_feed_thread) >> [ thread pid 6 tid 100172 ] >> Stopped at uma_dbg_free+0x53: movl 0xc4(%rdi),%esi >> db> bt >> Tracing pid 6 tid 100172 td 0xfffff8012551c9a0 >> uma_dbg_free() at uma_dbg_free+0x53/frame 0xfffffe3553e68a00 >> uma_zfree_arg() at uma_zfree_arg+0xaf/frame 0xfffffe3553e68a60 >> l2arc_feed_thread() at l2arc_feed_thread+0xe0b/frame 0xfffffe3553e68bb= 0 >> fork_exit() at fork_exit+0x84/frame 0xfffffe3553e68bf0 >> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe3553e68bf0 >> --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- >> db> >=20 Also: > Fatal trap 12: page fault while in kernel mode > cpuid =3D 4; apic id =3D 04 > fault virtual address =3D 0xc4 > fault code =3D supervisor read data, page not present > instruction pointer =3D 0x20:0xffffffff80ce6c63 > stack pointer =3D 0x28:0xfffffe3553e689d0 > frame pointer =3D 0x28:0xfffffe3553e68a00 > copanic: Bad list head 0xfffff8012534c9a8 first->prev !=3D head > cpuid =3D 6 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe355= 4716370 > vpanic() at vpanic+0x189/frame 0xfffffe35547163f0 > panic() at panic+0x43/frame 0xfffffe3554716450 > zone_import() at zone_import+0x25f/frame 0xfffffe35547164c0 > uma_zalloc_arg() at uma_zalloc_arg+0x3c6/frame 0xfffffe3554716530 > arc_get_data_buf() at arc_get_data_buf+0x36f/frame 0xfffffe3554716580 > arc_buf_alloc() at arc_buf_alloc+0x14f/frame 0xfffffe35547165c0 > arc_read() at arc_read+0x19e/frame 0xfffffe3554716670 > dbuf_read() at dbuf_read+0x804/frame 0xfffffe3554716710 > dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x1a0/fram= e 0xfffffe3554716780 > dmu_read_uio_dnode() at dmu_read_uio_dnode+0x40/frame 0xfffffe35547167f= 0 > dmu_read_uio_dbuf() at dmu_read_uio_dbuf+0x3b/frame 0xfffffe3554716820 > zfs_freebsd_read() at zfs_freebsd_read+0x464/frame 0xfffffe35547168c0 > VOP_READ_APV() at VOP_READ_APV+0x114/frame 0xfffffe35547168f0 > vn_read() at vn_read+0x247/frame 0xfffffe3554716970 > vn_io_fault() at vn_io_fault+0x10a/frame 0xfffffe35547169f0 > dofileread() at dofileread+0x95/frame 0xfffffe3554716a40 > kern_readv() at kern_readv+0x68/frame 0xfffffe3554716a90 > sys_read() at sys_read+0x63/frame 0xfffffe3554716ae0 > amd64_syscall() at amd64_syscall+0x282/frame 0xfffffe3554716bf0 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe3554716bf0 > --- syscall (3, FreeBSD ELF64, sys_read), rip =3D 0x8030ad3ea, rsp =3D = 0x7fffffffdcd8, rbp =3D 0x7fffffffdd10 --- > KDB: enter: panic > [ thread pid 969 tid 100909 ] > Stopped at kdb_enter+0x3e: movq $0,kdb_why --=20 Regards, Bryan Drewery --lhUlwXwEe6V91RlxhnP7dlf4OpuvrLn8Q Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJV3M/AAAoJEDXXcbtuRpfP9SQIANWm9MCiHjCqn1qif3zwgE+g jdAiZZtQ3i4p4Qj7iq5m4GnMUaULmdclid9WaobOBXfzhiaAbz0ZrYgWaa1F5ljo A/T1EM/R3uro4Dv7WLKCD2yH49VMQjIGW8dJ3MNaObmvjVb2bHDfLBWfJUOgOHTM KkJmsv8EpUJue/ZCRLxDDfLewp43NMV7lQ/DFgytQeuwNlwipHTCZKaePw0yypam 6I5PXleCVYOrvINnJpQ/nxrRTVTfNyYC/urG3OPR6DGvIUMID/RCY6173/AnT0mS ghRb+4m6NLkzznGObEvX15azuKSQIq3q2OiEwIQGqqxxEe5GtY8G7ffe4KsJ46k= =+Kmd -----END PGP SIGNATURE----- --lhUlwXwEe6V91RlxhnP7dlf4OpuvrLn8Q-- From owner-freebsd-fs@freebsd.org Thu Aug 27 12:05:09 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5E2289C49DE for ; Thu, 27 Aug 2015 12:05:09 +0000 (UTC) (envelope-from zeus@ibs.dn.ua) Received: from smtp.new-ukraine.org (smtp.new-ukraine.org [148.251.53.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp.new-ukraine.org", Issuer "smtp.new-ukraine.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 05E8D1D26 for ; Thu, 27 Aug 2015 12:05:08 +0000 (UTC) (envelope-from zeus@ibs.dn.ua) Received: on behalf of honored client by smtp.new-ukraine.org with ESMTP id t7RC0bWP010827 for on Thu, 27 Aug 2015 15:00:44 +0300 (EEST) Message-ID: <20150827150027.10825@smtp.new-ukraine.org> Date: Thu, 27 Aug 2015 15:00:27 +0300 From: "Zeus Panchenko" To: Subject: can zfs snapshot be used to back LUN in ctl.conf ? Organization: I.B.S. LLC Reply-To: "Zeus Panchenko" X-Attribution: zeus Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAFVBMVEWxsbGdnZ3U1NQTExN cXFzx8fG/v7+f8hyWAAACXUlEQVQ4jUWSwXYiIRBFi4yyhtjtWpmRdTL0ZC3TJOukDa6Rc+T/P2F eFepwtFvr8upVFVDua8mLWw6La4VIKTuMdAPOebdU55sQs3n/D1xFFPFGVGh4AHKttr5K0bS6g7N ZCge7qpVLB+f1Z2WAj2OKXwIWt/bXpdXSiu8KXbviWkHxF5td9+lg2e3xlI2SCvatK8YLfHyh9lw 15yrad8Va5eXg4Llr7QmAaC+dL9sDt9iad/DX3OKvLMBf+dm0A0QuMrTvYIevSik1IaSVvgjIHt5 lSCG2ynNRpEcBZ8cgDWk+Ns99qzsYYV3MZoppWzGtYlTO9+meG6m/g92iNO9LfQB2JZsMpoJs7QG ku2KtabRK0bZRwDLyBDvwlxTm6ZlP7qyOqLcfqtLexpDSB4M0H3I/PQy1emvjjzgK+A0LmMKl6Lq zlqzh0VGAw440F6MJd8cY0nI7wiF/fVIBGY7UNCAXy6DmfYGCLLI0wtDbVcDUMqtJLmAhLqODQAe riERAxXJ1/QYGpa0ymqyytpKC19MNXHjvFmEsfcHIrncFR4xdbYWgmfEGLCcZokpGbGj1egMR+6M 1BkNX1pDdhPcOXpAnAeLQUwQLYepgQoZVNGS61yaE8CYA7gYAcWKzwGstACY2HTFvvOwk4FXAG/a mKHni/EcA/GkOk7I0IK7UMIf3+SahU8/FJdiE7KcuWdM3MFocUDEEIX9LfJoo4xV5tnNKc3jJuSs SZWgnnhepgU1zN4Hii18yW4RwDX52CXUtk0Hqz6cHOIUkWaX8fDcB+J7y1y2xDHwjv/8Buu8Ekz6 7tXQAAAAASUVORK5CYII= X-Mailer: MH-E 8.3.1; GNU Mailutils 2.99.98; GNU Emacs 24.3.1 MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable X-NewUkraine-Agent: mailfromd (7.99.92) X-NewUkraine-URL: https://mail.prozora-kraina.org/smtp.html X-NewUkraine-VirStat: NO X-NewUkraine-VirScan: ScanPE, ScanELF, ScanOLE2, ScanMail, PhishingSignatures, ScanHTML, ScanPDF X-NewUkraine-SpamStat: NO X-NewUkraine-SpamScore: -1.600 of 3.500 X-NewUkraine-SpamKeys: AWL,BAYES_00,NO_RECEIVED,NO_RELAYS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Aug 2015 12:05:09 -0000 =2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 greetings, please help me to understand where to look at ... recently I switched from istgt to ctld and now wonder, whether can zfs snapshot be used to back the LUN for ctld? istgt do allows that, while ctld fails to start and complains but if I copy the file from zfs snapshot to some place, then ctld starts as expected ... bellow the details are: =2D ---[ ctld debug quotation start ]--------------------------------------= ----- ... ctld: adding lun 0, target iqn.2007-09.jp.ne.peach.istgt:file011 ctld: adding lun 0, target iqn.2007-09.jp.ne.peach.istgt:file012 ctld: error returned from LUN creation request: ctl_be_block_open: error op= ening /storage/win/.zfs/snapshot/daily-2015-08-22/file013 ctld: failed to add lun 0, target iqn.2007-09.jp.ne.peach.istgt:file013 ctld: adding lun 0, target iqn.2007-09.jp.ne.peach.istgt:file014 ctld: adding lun 0, target iqn.2007-09.jp.ne.peach.istgt:file015 ... ctld: adding lun 0, target iqn.2007-09.jp.ne.peach.istgt:file052 ctld: adding lun 0, target iqn.2007-09.jp.ne.peach.istgt:file053 ctld: not listening on portal-group "default", not assigned to any target ctld: listening on 10.100.21.47, portal-group "alfa" ctld: listening on 10.100.21.47, portal-group "beta" ctld: failed to apply configuration; exiting /etc/rc.d/ctld: WARNING: failed to start ctld =2D ---[ ctld debug quotation end ]--------------------------------------= ----- =2D ---[ ctl.conf quotation start ]----------------------------------------= --- target iqn.2007-09.jp.ne.peach.istgt:file013 { alias "file013-users" portal-group alfa auth-group ag-file013 lun 0 { path /storage/win/.zfs/snapshot/daily-2015-08-22/file013 size 300G } } =2D ---[ ctl.conf quotation end ]----------------------------------------= --- the very file exists: > stat /storagez/win/.zfs/snapshot/daily-2015-08-22/traders.ts.ibs 3500296891 22 -rw-r--r-- 1 root wheel 4294967295 214748364800 "Oct 23 07:41= :38 2013" "Aug 21 04:00:28 2015" "Aug 21 04:00:28 2015" "Oct 23 07:41:38 20= 13" 131072 419838466 0x800 /storage/win/.zfs/snapshot/daily-2015-08-22/file= 013 another question: can ctld be configured to ignore unavailable config parts? like unaccessible/missconfigured LUNs =2D --=20 Zeus V. Panchenko jid:zeus@im.ibs.dn.ua IT Dpt., I.B.S. LLC GMT+2 (EET) =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlXe+9sACgkQr3jpPg/3oyoyeQCfdkWWFKUVTQYHKE0eRVz93Dhy zqgAoN4FFz4EJNJfNdoRF2fLeayVxlRy =3DwZJg =2D----END PGP SIGNATURE----- From owner-freebsd-fs@freebsd.org Thu Aug 27 19:53:46 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AEDE59C4BA7; Thu, 27 Aug 2015 19:53:46 +0000 (UTC) (envelope-from milios@ccsys.com) Received: from cargobay.net (cargobay.net [198.178.123.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8B5E81FB8; Thu, 27 Aug 2015 19:53:46 +0000 (UTC) (envelope-from milios@ccsys.com) Received: from [192.168.0.2] (cblmdm72-240-160-19.buckeyecom.net [72.240.160.19]) by cargobay.net (Postfix) with ESMTPSA id 0D69AD31; Thu, 27 Aug 2015 19:49:59 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Options for zfs inside a VM backed by zfs on the host From: "Chad J. Milios" In-Reply-To: <55DF46F5.4070406@redbarn.org> Date: Thu, 27 Aug 2015 15:53:42 -0400 Cc: Matt Churchyard , Vick Khera , allanjude@freebsd.org, "freebsd-virtualization@freebsd.org" , freebsd-fs@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <453A5A6F-E347-41AE-8CBC-9E0F4DA49D38@ccsys.com> References: <20150827061044.GA10221@blazingdot.com> <20150827062015.GA10272@blazingdot.com> <1a6745e27d184bb99eca7fdbdc90c8b5@SERVER.ad.usd-group.com> <55DF46F5.4070406@redbarn.org> To: Paul Vixie X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Aug 2015 19:53:46 -0000 > On Aug 27, 2015, at 10:46 AM, Allan Jude = wrote: >=20 > On 2015-08-27 02:10, Marcus Reid wrote: >> On Wed, Aug 26, 2015 at 05:25:52PM -0400, Vick Khera wrote: >>> I'm running FreeBSD inside a VM that is providing the virtual disks = backed >>> by several ZFS zvols on the host. I want to run ZFS on the VM itself = too >>> for simplified management and backup purposes. >>>=20 >>> The question I have is on the VM guest, do I really need to run a = raid-z or >>> mirror or can I just use a single virtual disk (or even a stripe)? = Given >>> that the underlying storage for the virtual disk is a zvol on a = raid-z >>> there should not really be too much worry for data corruption, I = would >>> think. It would be equivalent to using a hardware raid for each = component >>> of my zfs pool. >>>=20 >>> Opinions? Preferably well-reasoned ones. :) >>=20 >> This is a frustrating situation, because none of the options that I = can >> think of look particularly appealing. Single-vdev pools would be the >> best option, your redundancy is already taken care of by the host's >> pool. The overhead of checksumming, etc. twice is probably not super >> bad. However, having the ARC eating up lots of memory twice seems >> pretty bletcherous. You can probably do some tuning to reduce that, = but >> I never liked tuning the ARC much. >>=20 >> All the nice features ZFS brings to the table is hard to give up once >> you get used to having them around, so I understand your quandry. >>=20 >> Marcus >=20 > You can just: >=20 > zfs set primarycache=3Dmetadata poolname >=20 > And it will only cache metadata in the ARC inside the VM, and avoid > caching data blocks, which will be cached outside the VM. You could = even > turn the primarycache off entirely. >=20 > --=20 > Allan Jude > On Aug 27, 2015, at 1:20 PM, Paul Vixie wrote: >=20 > let me ask a related question: i'm using FFS in the guest, zvol on the > host. should i be telling my guest kernel to not bother with an FFS > buffer cache at all, or to use a smaller one, or what? Whether we are talking ffs, ntfs or zpool atop zvol, unfortunately there = are really no simple answers. You must consider your use case, the host = and vm hardware/software configuration, perform meaningful benchmarks = and, if you care about data integrity, thorough tests of the likely = failure modes (all far more easily said than done). I=E2=80=99m curious = to hear more about your use case(s) and setups so as to offer better = insight on what alternatives may make more/less sense for you. = Performance needs? Are you striving for lower individual latency or = higher combined throughput? How critical are integrity and availability? = How do you prefer your backup routine? Do you handle that in guest or = host? Want features like dedup and/or L2ARC up in the mix? (Then = everything bears reconsideration, just about triple your research and = testing efforts.) Sorry, I=E2=80=99m really not trying to scare anyone away from ZFS. It = is awesome and capable of providing amazing solutions with very reliable = and sensible behavior if handled with due respect, fear, monitoring and = upkeep. :) There are cases to be made for caching [meta-]data in the child, in the = parent, checksumming in the child/parent/both, compressing in the = child/parent. I believe `gstat` along with your custom-made benchmark or = test load will greatly help guide you. ZFS on ZFS seems to be a hardly studied, seldom reported, never = documented, tedious exercise. Prepare for accelerated greying and = balding of your hair. The parent's volblocksize, child's ashift, = alignment, interactions involving raidz stripes (if used) can lead to = problems from slightly decreased performance and storage efficiency to = pathological write amplification within ZFS, performance and = responsiveness crashing and sinking to the bottom of the ocean. Some = datasets can become veritable black holes to vfs system calls. You may = see ZFS reporting elusive errors, deadlocking or panicing in the child = or parent altogether. With diligence though, stable and performant = setups can be discovered for many production situations. For example, for a zpool (whether used by a VM or not, locally, thru = iscsi, ggate[cd], or whatever) atop zvol which sits on parent zpool with = no redundancy, I would set primarycache=3Dmetadata checksum=3Doff = compression=3Doff for the zvol(s) on the host(s) and for the most part = just use the same zpool settings and sysctl tunings in the VM (or child = zpool, whatever role it may conduct) that i would otherwise use on bare = cpu and bare drives (defaults + compression=3Dlz4 atime=3Doff). However, = that simple case is likely not yours. With ufs/ffs/ntfs/ext4 and most other filesystems atop a zvol i use = checksums on the parent zvol, and compression too if the child doesn=E2=80= =99t support it (as ntfs can), but still caching only metadata on the = host and letting the child vm/fs cache real data. My use case involves charging customers for their memory use so = admittedly that is one motivating factor, LOL. Plus, i certainly don=E2=80= =99t want one rude VM marching through host ARC unfairly evacuating and = starving the other polite neighbors. VM=E2=80=99s swap space becomes another consideration and I treat it = like any other =E2=80=98dumb=E2=80=99 filesystem with compression and = checksumming done by the parent but recent versions of many operating = systems may be paging out only already compressed data, so investigate = your guest OS. I=E2=80=99ve found lz4=E2=80=99s claims of an = almost-no-penalty early-abort to be vastly overstated when dealing with = zvols, small block sizes and high throughput so if you can be certain = you=E2=80=99ll be dealing with only compressed data then turn it off. = For the virtual memory pagers in most current-day OS=E2=80=99s though = set compression on the swap=E2=80=99s backing zvol to lz4. Another factor is the ZIL. One VM can hoard your synchronous write = performance. Solutions are beyond the scope of this already-too-long = email :) but I=E2=80=99d be happy to elaborate if queried. And then there=E2=80=99s always netbooting guests from NFS mounts served = by the host and giving the guest no virtual disks, don=E2=80=99t forget = to consider that option. Hope this provokes some fruitful ideas for you. Glad to philosophize = about ZFS setups with ya=E2=80=99ll :) -chad= From owner-freebsd-fs@freebsd.org Thu Aug 27 20:22:58 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 873789C36EC for ; Thu, 27 Aug 2015 20:22:58 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "NewFS.denninger.net", Issuer "NewFS.denninger.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 3250D132E for ; Thu, 27 Aug 2015 20:22:57 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (localhost [127.0.0.1]) by fs.denninger.net (8.15.2/8.14.8) with ESMTP id t7RKMnbt059314 for ; Thu, 27 Aug 2015 15:22:50 -0500 (CDT) (envelope-from karl@denninger.net) Received: from [192.168.1.40] [192.168.1.40] (Via SSLv3 AES128-SHA) ; by Spamblock-sys (LOCAL/AUTH) Thu Aug 27 15:22:49 2015 Subject: Re: Panic in ZFS during zfs recv (while snapshots being destroyed) To: freebsd-fs@freebsd.org References: <55BB443E.8040801@denninger.net> <55CF7926.1030901@denninger.net> From: Karl Denninger X-Enigmail-Draft-Status: N1110 Message-ID: <55DF7191.2080409@denninger.net> Date: Thu, 27 Aug 2015 15:22:41 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <55CF7926.1030901@denninger.net> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms000808020402070505000800" X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Aug 2015 20:22:58 -0000 This is a cryptographically signed message in MIME format. --------------ms000808020402070505000800 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 8/15/2015 12:38, Karl Denninger wrote: > Update: > > This /appears /to be related to attempting to send or receive a > /cloned /snapshot. > > I use /beadm /to manage boot environments and the crashes have all > come while send/recv-ing the root pool, which is the one where these > clones get created. It is /not /consistent within a given snapshot > when it crashes and a second attempt (which does a "recovery" > send/receive) succeeds every time -- I've yet to have it panic twice > sequentially. > > I surmise that the problem comes about when a file in the cloned > snapshot is modified, but this is a guess at this point. > > I'm going to try to force replication of the problem on my test system.= > > On 7/31/2015 04:47, Karl Denninger wrote: >> I have an automated script that runs zfs send/recv copies to bring a >> backup data set into congruence with the running copies nightly. The >> source has automated snapshots running on a fairly frequent basis >> through zfs-auto-snapshot. >> >> Recently I have started having a panic show up about once a week durin= g >> the backup run, but it's inconsistent. It is in the same place, but I= >> cannot force it to repeat. >> >> The trap itself is a page fault in kernel mode in the zfs code at >> zfs_unmount_snap(); here's the traceback from the kvm (sorry for the >> image link but I don't have a better option right now.) >> >> I'll try to get a dump, this is a production machine with encrypted sw= ap >> so it's not normally turned on. >> >> Note that the pool that appears to be involved (the backup pool) has >> passed a scrub and thus I would assume the on-disk structure is ok....= =2E >> but that might be an unfair assumption. It is always occurring in the= >> same dataset although there are a half-dozen that are sync'd -- if thi= s >> one (the first one) successfully completes during the run then all the= >> rest will as well (that is, whenever I restart the process it has alwa= ys >> failed here.) The source pool is also clean and passes a scrub. >> >> traceback is at http://www.denninger.net/kvmimage.png; apologies for t= he >> image traceback but this is coming from a remote KVM. >> >> I first saw this on 10.1-STABLE and it is still happening on FreeBSD >> 10.2-PRERELEASE #9 r285890M, which I updated to in an attempt to see i= f >> the problem was something that had been addressed. >> >> > > --=20 > Karl Denninger > karl@denninger.net > /The Market Ticker/ > /[S/MIME encrypted email preferred]/ Second update: I have now taken another panic on 10.2-Stable, same deal, but without any cloned snapshots in the source image. I had thought that removing cloned snapshots might eliminate the issue; that is now out the window. It ONLY happens on this one filesystem (the root one, incidentally) which is fairly-recently created as I moved this machine from spinning rust to SSDs for the OS and root pool -- and only when it is being backed up by using zfs send | zfs recv (with the receive going to a different pool in the same machine.) I have yet to be able to provoke it when using zfs send to copy to a different machine on the same LAN, but given that it is not able to be reproduced on demand I can't be certain it's timing related (e.g. performance between the two pools in question) or just that I haven't hit the unlucky combination. This looks like some sort of race condition and I will continue to see if I can craft a case to make it occur "on demand" --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms000808020402070505000800 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNTA4MjcyMDIyNDFaME8GCSqGSIb3DQEJBDFCBECb Dgr+d2nEYlHfHUt98UBNkMqjQ/Fyo5PexVssCqJTqjGMgr0Sqs4QGOWhffL3WPmBCeOy0bfQ 5kyUBozIN66oMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAmb5WzQI7 lN1Y6vYI52qpPhnJ55u5LIwAJCPpsyNCJXlYh76Wu399RmyMbjc2LjGJmYSx6rpzK4O9j8BX k/VbKPNbJ5gWjJyyVXO4RRCTLNczCNU8JkboCcnq8eUBlL/4V/QoviDjDgzQKQ75pCY9M0JW a6vs2X6bcyVzF05clAgeHGh+/aQhSFToLXyqpitHNixjlWUDSdJ7o7EOcP9y4iWlQzgCjamZ fQrjlCZLPm64PmJ+Jy//lfuwFThYvhErQi8SA/kNg8A+GFsTUwlhSXMEQ4n9KhlhQDGFUUhb kNztItLgtgdLwsuUjwKJ+yc/PyUr1F9F3S9QR3lfc7JgyZYiwXGx+w40aEuR1jb3YSray3uM lw0SEAAoNh2Mi2i2/rJIEjFqLsFiJo01wEvWqkUWPcxeG9sgODL6DoafzJM1fSw6Rz09gLAv uzYj8+HYrEcfEvga2Ayi6ypZ/trBcbBdhHDgTVPqZ8GEAJOVFjpqhHDqVtX2tUN+cksJhvLo /1DYLfwJc2SViApUx5GM9Xc7q7efrvz0m14/ylKbZUUlPfbfN30bb1GTuLD5eoweuasXflRY HczVBhmMlZi9P+Stlwvb3QSWcIttXSjVUJaqOAEK93kg0odoNe8CDA/U21w/zUSK46gyiJSc +QrAkX69arUQ5RfsOQIp2Xk5MdgAAAAAAAA= --------------ms000808020402070505000800-- From owner-freebsd-fs@freebsd.org Thu Aug 27 20:30:39 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0C0589C393A for ; Thu, 27 Aug 2015 20:30:39 +0000 (UTC) (envelope-from sean@chittenden.org) Received: from mail01.lax1.stackjet.com (mon01.lax1.stackjet.com [174.136.104.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E7FF917B4 for ; Thu, 27 Aug 2015 20:30:37 +0000 (UTC) (envelope-from sean@chittenden.org) Received: from hormesis.local (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: sean@chittenden.org) by mail01.lax1.stackjet.com (Postfix) with ESMTPSA id AB8833E8E5A; Thu, 27 Aug 2015 13:30:30 -0700 (PDT) Received: from hormesis.local ([173.228.13.241] helo=hormesis.local) by ASSP.nospam with SMTPS(ECDHE-RSA-AES256-SHA) (2.4.2); 27 Aug 2015 13:30:28 -0700 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Panic in ZFS during zfs recv (while snapshots being destroyed) From: Sean Chittenden In-Reply-To: <55DF7191.2080409@denninger.net> Date: Thu, 27 Aug 2015 13:30:24 -0700 Cc: freebsd-fs@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <55BB443E.8040801@denninger.net> <55CF7926.1030901@denninger.net> <55DF7191.2080409@denninger.net> To: Karl Denninger X-Mailer: Apple Mail (2.2104) X-Assp-Version: 2.4.2(14097) on ASSP.nospam X-Assp-ID: ASSP.nospam m1-07430-05968 X-Assp-Session: 844144288 (mail 1) X-Assp-Envelope-From: sean@chittenden.org X-Assp-Intended-For: karl@denninger.net X-Assp-Intended-For: freebsd-fs@freebsd.org X-Assp-Client-TLS: yes X-Assp-Server-TLS: yes X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Aug 2015 20:30:39 -0000 Have you tried disabling TRIM? We recently ran in to an issue where a = `zfs delete` on a large dataset caused the host to panic because TRIM = was tripping over the ZFS deadman timer. Disabling TRIM worked as = valid workaround for us. ? You mentioned a recent move to SSDs, so = this can happen, esp after the drive has experienced a little bit of = actual work. ? -sc -- Sean Chittenden sean@chittenden.org > On Aug 27, 2015, at 13:22, Karl Denninger wrote: >=20 > On 8/15/2015 12:38, Karl Denninger wrote: >> Update: >>=20 >> This /appears /to be related to attempting to send or receive a >> /cloned /snapshot. >>=20 >> I use /beadm /to manage boot environments and the crashes have all >> come while send/recv-ing the root pool, which is the one where these >> clones get created. It is /not /consistent within a given snapshot >> when it crashes and a second attempt (which does a "recovery" >> send/receive) succeeds every time -- I've yet to have it panic twice >> sequentially. >>=20 >> I surmise that the problem comes about when a file in the cloned >> snapshot is modified, but this is a guess at this point. >>=20 >> I'm going to try to force replication of the problem on my test = system. >>=20 >> On 7/31/2015 04:47, Karl Denninger wrote: >>> I have an automated script that runs zfs send/recv copies to bring a >>> backup data set into congruence with the running copies nightly. = The >>> source has automated snapshots running on a fairly frequent basis >>> through zfs-auto-snapshot. >>>=20 >>> Recently I have started having a panic show up about once a week = during >>> the backup run, but it's inconsistent. It is in the same place, but = I >>> cannot force it to repeat. >>>=20 >>> The trap itself is a page fault in kernel mode in the zfs code at >>> zfs_unmount_snap(); here's the traceback from the kvm (sorry for the >>> image link but I don't have a better option right now.) >>>=20 >>> I'll try to get a dump, this is a production machine with encrypted = swap >>> so it's not normally turned on. >>>=20 >>> Note that the pool that appears to be involved (the backup pool) has >>> passed a scrub and thus I would assume the on-disk structure is = ok..... >>> but that might be an unfair assumption. It is always occurring in = the >>> same dataset although there are a half-dozen that are sync'd -- if = this >>> one (the first one) successfully completes during the run then all = the >>> rest will as well (that is, whenever I restart the process it has = always >>> failed here.) The source pool is also clean and passes a scrub. >>>=20 >>> traceback is at http://www.denninger.net/kvmimage.png; apologies for = the >>> image traceback but this is coming from a remote KVM. >>>=20 >>> I first saw this on 10.1-STABLE and it is still happening on FreeBSD >>> 10.2-PRERELEASE #9 r285890M, which I updated to in an attempt to see = if >>> the problem was something that had been addressed. >>>=20 >>>=20 >>=20 >> --=20 >> Karl Denninger >> karl@denninger.net >> /The Market Ticker/ >> /[S/MIME encrypted email preferred]/ >=20 > Second update: I have now taken another panic on 10.2-Stable, same = deal, > but without any cloned snapshots in the source image. I had thought = that > removing cloned snapshots might eliminate the issue; that is now out = the > window. >=20 > It ONLY happens on this one filesystem (the root one, incidentally) > which is fairly-recently created as I moved this machine from spinning > rust to SSDs for the OS and root pool -- and only when it is being > backed up by using zfs send | zfs recv (with the receive going to a > different pool in the same machine.) I have yet to be able to provoke > it when using zfs send to copy to a different machine on the same LAN, > but given that it is not able to be reproduced on demand I can't be > certain it's timing related (e.g. performance between the two pools in > question) or just that I haven't hit the unlucky combination. >=20 > This looks like some sort of race condition and I will continue to see > if I can craft a case to make it occur "on demand" >=20 > --=20 > Karl Denninger > karl@denninger.net > /The Market Ticker/ > /[S/MIME encrypted email preferred]/ From owner-freebsd-fs@freebsd.org Thu Aug 27 20:44:37 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0F77E9C3EC8 for ; Thu, 27 Aug 2015 20:44:37 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "NewFS.denninger.net", Issuer "NewFS.denninger.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id A57071F88 for ; Thu, 27 Aug 2015 20:44:36 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (localhost [127.0.0.1]) by fs.denninger.net (8.15.2/8.14.8) with ESMTP id t7RKiZ6b066537 for ; Thu, 27 Aug 2015 15:44:35 -0500 (CDT) (envelope-from karl@denninger.net) Received: from [192.168.1.40] [192.168.1.40] (Via SSLv3 AES128-SHA) ; by Spamblock-sys (LOCAL/AUTH) Thu Aug 27 15:44:35 2015 Subject: Re: Panic in ZFS during zfs recv (while snapshots being destroyed) To: Sean Chittenden References: <55BB443E.8040801@denninger.net> <55CF7926.1030901@denninger.net> <55DF7191.2080409@denninger.net> Cc: freebsd-fs@freebsd.org From: Karl Denninger X-Enigmail-Draft-Status: N1110 Message-ID: <55DF76AA.3040103@denninger.net> Date: Thu, 27 Aug 2015 15:44:26 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms030109030707060501020908" X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Aug 2015 20:44:37 -0000 This is a cryptographically signed message in MIME format. --------------ms030109030707060501020908 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable No, but that does sound like it might be involved..... And yeah, this did start when I moved the root pool to a mirrored pair of Intel 530s off a pair of spinning-rust WD RE4s.... (The 530s are darn nice performance-wise, reasonably inexpensive and thus very suitable for a root filesystem drive and they also pass the "pull the power cord" test, incidentally.) You may be onto something -- I'll try shutting it off, but due to the fact that I can't make this happen and it's a "every week or two" panic, but ALWAYS when the zfs send | zfs recv is running AND it's always on the same filesystem it will be a fair while before I know if it's fixed (like over a month, given the usual pattern here, as that would be 4 "average" periods without a panic)..... I also wonder if I could tune this out with some of the other TRIM parameters instead of losing it entirely. vfs.zfs.trim.max_interval: 1 vfs.zfs.trim.timeout: 30 vfs.zfs.trim.txg_delay: 32 vfs.zfs.trim.enabled: 1 vfs.zfs.vdev.trim_max_pending: 10000 vfs.zfs.vdev.trim_max_active: 64 vfs.zfs.vdev.trim_min_active: 1 That it's panic'ing on a mtx_lock_sleep might point this way.... the trace shows it coming from a zfs_onexit_destroy, which ends up calling zfs_unmount_snap() and then it blows in dounmount() while executing mtx_lock_sleep(). I do wonder if I'm begging for new and innovative performance issues if I run with TRIM off for an extended period of time, however..... :-) On 8/27/2015 15:30, Sean Chittenden wrote: > Have you tried disabling TRIM? We recently ran in to an issue where a = `zfs delete` on a large dataset caused the host to panic because TRIM was= tripping over the ZFS deadman timer. Disabling TRIM worked as valid wo= rkaround for us. ? You mentioned a recent move to SSDs, so this can hap= pen, esp after the drive has experienced a little bit of actual work. ? = -sc > > > -- > Sean Chittenden > sean@chittenden.org > > >> On Aug 27, 2015, at 13:22, Karl Denninger wrote: >> >> On 8/15/2015 12:38, Karl Denninger wrote: >>> Update: >>> >>> This /appears /to be related to attempting to send or receive a >>> /cloned /snapshot. >>> >>> I use /beadm /to manage boot environments and the crashes have all >>> come while send/recv-ing the root pool, which is the one where these >>> clones get created. It is /not /consistent within a given snapshot >>> when it crashes and a second attempt (which does a "recovery" >>> send/receive) succeeds every time -- I've yet to have it panic twice >>> sequentially. >>> >>> I surmise that the problem comes about when a file in the cloned >>> snapshot is modified, but this is a guess at this point. >>> >>> I'm going to try to force replication of the problem on my test syste= m. >>> >>> On 7/31/2015 04:47, Karl Denninger wrote: >>>> I have an automated script that runs zfs send/recv copies to bring a= >>>> backup data set into congruence with the running copies nightly. Th= e >>>> source has automated snapshots running on a fairly frequent basis >>>> through zfs-auto-snapshot. >>>> >>>> Recently I have started having a panic show up about once a week dur= ing >>>> the backup run, but it's inconsistent. It is in the same place, but= I >>>> cannot force it to repeat. >>>> >>>> The trap itself is a page fault in kernel mode in the zfs code at >>>> zfs_unmount_snap(); here's the traceback from the kvm (sorry for the= >>>> image link but I don't have a better option right now.) >>>> >>>> I'll try to get a dump, this is a production machine with encrypted = swap >>>> so it's not normally turned on. >>>> >>>> Note that the pool that appears to be involved (the backup pool) has= >>>> passed a scrub and thus I would assume the on-disk structure is ok..= =2E.. >>>> but that might be an unfair assumption. It is always occurring in t= he >>>> same dataset although there are a half-dozen that are sync'd -- if t= his >>>> one (the first one) successfully completes during the run then all t= he >>>> rest will as well (that is, whenever I restart the process it has al= ways >>>> failed here.) The source pool is also clean and passes a scrub. >>>> >>>> traceback is at http://www.denninger.net/kvmimage.png; apologies for= the >>>> image traceback but this is coming from a remote KVM. >>>> >>>> I first saw this on 10.1-STABLE and it is still happening on FreeBSD= >>>> 10.2-PRERELEASE #9 r285890M, which I updated to in an attempt to see= if >>>> the problem was something that had been addressed. >>>> >>>> >>> --=20 >>> Karl Denninger >>> karl@denninger.net >>> /The Market Ticker/ >>> /[S/MIME encrypted email preferred]/ >> Second update: I have now taken another panic on 10.2-Stable, same dea= l, >> but without any cloned snapshots in the source image. I had thought th= at >> removing cloned snapshots might eliminate the issue; that is now out t= he >> window. >> >> It ONLY happens on this one filesystem (the root one, incidentally) >> which is fairly-recently created as I moved this machine from spinning= >> rust to SSDs for the OS and root pool -- and only when it is being >> backed up by using zfs send | zfs recv (with the receive going to a >> different pool in the same machine.) I have yet to be able to provoke= >> it when using zfs send to copy to a different machine on the same LAN,= >> but given that it is not able to be reproduced on demand I can't be >> certain it's timing related (e.g. performance between the two pools in= >> question) or just that I haven't hit the unlucky combination. >> >> This looks like some sort of race condition and I will continue to see= >> if I can craft a case to make it occur "on demand" >> >> --=20 >> Karl Denninger >> karl@denninger.net >> /The Market Ticker/ >> /[S/MIME encrypted email preferred]/ > > > %SPAMBLOCK-SYS: Matched [+Sean Chittenden ], messa= ge ok > --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms030109030707060501020908 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNTA4MjcyMDQ0MjZaME8GCSqGSIb3DQEJBDFCBECF WlEkAcLqO2f2Z8ZDvK5ecb19GxxKf8dr80mFFcoispyzX4aUDCZubvMlg9EVVArF+i7NFHVg 5oT1ZZ9LoT7TMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAJs/0rpt+ O+ACeKMvpA2q8zRUc3n//fO+J9bOx2Xbsxgd8Rmu9PzAmxDZd16WP3gkD67WTGkXZiEcWJQL 1tIELTf2UwjzuMk7S3bQer9hxsN6PdIotYrPFgU3/2FP2wucTBB9zMTgUaVcCcilshPbyVUE ADQcs8wkOzZAT5yuCNE3EpZMCd2T/BMs1k0gmpMQTP9rzVyaRgOYkn6TVAHKf40PPKL+qJh8 hkNU9mj25zlVXkNgEHQcruWoGYYFDDBg+pHLQNgDzGahMmlFbn/ZFKRsa0PXYtPgPoG13rfc 68LGmhD3KfX8p9Yqhy04FFPb2RIfYYYajCjMgWjLrZ4qozNOpO3xM34QTLtB85K2C0InOLp5 zSVEgjfm9O3ascu5mA1BcQt8OluX4nkpObMusBoyu5fnJJPhXJ4/OLKkU+JMASNtp25MSDM4 ln8KvWBz61vKWRJRXkF3YntfnhffqpwbYKk/3IljZ0Z0m2pTEbEpLYbDapCArOVGsHoIKcTZ BjCz02/64eAA6dh0hMdXCGaPtawZf/7hwWPx7S4ioRrE0vnc/afI9TMZSY+fwLwLtUYUZSgj K2JrKdmbSpC0eTB1MfR+uRLGN9aCasJmVQs2FumLRYRd+6RMn+raKo692is3ZmjJkwQRt3zb qnhnA1qBWBRoCbmSB5hoAILXYcIAAAAAAAA= --------------ms030109030707060501020908-- From owner-freebsd-fs@freebsd.org Thu Aug 27 21:06:27 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A6AD79C45BF for ; Thu, 27 Aug 2015 21:06:27 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id D5D46CBF for ; Thu, 27 Aug 2015 21:06:26 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id AAA19203; Fri, 28 Aug 2015 00:06:25 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZV4NM-000NjV-LJ; Fri, 28 Aug 2015 00:06:24 +0300 Subject: Re: Panic in ZFS during zfs recv (while snapshots being destroyed) To: Karl Denninger , freebsd-fs@FreeBSD.org References: <55BB443E.8040801@denninger.net> <55CF7926.1030901@denninger.net> <55DF7191.2080409@denninger.net> From: Andriy Gapon Message-ID: <55DF7B98.9070902@FreeBSD.org> Date: Fri, 28 Aug 2015 00:05:28 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <55DF7191.2080409@denninger.net> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Aug 2015 21:06:27 -0000 On 27/08/2015 23:22, Karl Denninger wrote: > traceback is at http://www.denninger.net/kvmimage.png; apologies for the > image traceback but this is coming from a remote KVM. Did you manage to get a crash dump? A patch from this review request https://reviews.freebsd.org/D2794 might happen to be a fix for this problem, although originally it was develop to address a different kind of an unmount race. -- Andriy Gapon From owner-freebsd-fs@freebsd.org Thu Aug 27 23:47:26 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4B15E9C3C7F; Thu, 27 Aug 2015 23:47:26 +0000 (UTC) (envelope-from tenzin.lhakhang@gmail.com) Received: from mail-lb0-x230.google.com (mail-lb0-x230.google.com [IPv6:2a00:1450:4010:c04::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B1B7095A; Thu, 27 Aug 2015 23:47:25 +0000 (UTC) (envelope-from tenzin.lhakhang@gmail.com) Received: by lbbtg9 with SMTP id tg9so20949419lbb.1; Thu, 27 Aug 2015 16:47:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=aJtSgo6ud7CKF2drcW7Uf7J85ELv4p/hDDkXHJEN4Vw=; b=v24aQ9+lheQSC9r1U8S0DgGaMcWVSF5sbuwqRGhN0gNgrXXBTFZXLXZe9m6/78GX1l 3wWHD+JjIcV4YhhLa67Wz2KQleoSF9iQH7sysNuABezzlX//D6qvZfctEYScKAOTKSBY RD4pyesOIcyn75B83RiQ4CigMvBymv4fF+Ox4e/Z8T4R09YMsHZzpME6SU3EAPEzdonM 81qJP/s1kil0TPaPuA2Zfu02li+kj47vf1hcoFgiTiLJKzr9LtGSLKMapQ0vqYa89A6D ecD89X/u8wuGIo1IBcJRDSHa2vKD1ZK+Soo5NMiD1bt2ugFnBbwPZIOerAnjDTFkP8lk qX9g== MIME-Version: 1.0 X-Received: by 10.112.204.162 with SMTP id kz2mr3414817lbc.115.1440719242475; Thu, 27 Aug 2015 16:47:22 -0700 (PDT) Received: by 10.25.127.9 with HTTP; Thu, 27 Aug 2015 16:47:22 -0700 (PDT) In-Reply-To: <453A5A6F-E347-41AE-8CBC-9E0F4DA49D38@ccsys.com> References: <20150827061044.GA10221@blazingdot.com> <20150827062015.GA10272@blazingdot.com> <1a6745e27d184bb99eca7fdbdc90c8b5@SERVER.ad.usd-group.com> <55DF46F5.4070406@redbarn.org> <453A5A6F-E347-41AE-8CBC-9E0F4DA49D38@ccsys.com> Date: Thu, 27 Aug 2015 19:47:22 -0400 Message-ID: Subject: Re: Options for zfs inside a VM backed by zfs on the host From: Tenzin Lhakhang To: "Chad J. Milios" Cc: Paul Vixie , freebsd-fs@freebsd.org, Vick Khera , Matt Churchyard , "freebsd-virtualization@freebsd.org" , allanjude@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Aug 2015 23:47:26 -0000 That was a really awesome read! The idea of turning metadata on at the backend zpool and then data on the VM was interesting, I will give that a try. Please can you elaborate more on the ZILs and synchronous writes by VMs.. that seems like a great topic. - I am right now exploring the question: are SSD ZILs necessary in an all SSD pool? and then the question of NVMe SSD ZILs onto of an all SSD pool. My guess at the moment is that SSD ZILs are not necessary at all in an SSD pool during intensive IO. I've been told that ZILs are always there to help you, but when your pool aggregate IOPs is greater than the a ZIL, it doesn't seem to make sense.. Or is it the latency of writing to a single disk vs striping across your "fast" vdevs? Thanks, Tenzin On Thu, Aug 27, 2015 at 3:53 PM, Chad J. Milios wrote: > > On Aug 27, 2015, at 10:46 AM, Allan Jude wrote: > > > > On 2015-08-27 02:10, Marcus Reid wrote: > >> On Wed, Aug 26, 2015 at 05:25:52PM -0400, Vick Khera wrote: > >>> I'm running FreeBSD inside a VM that is providing the virtual disks > backed > >>> by several ZFS zvols on the host. I want to run ZFS on the VM itself > too > >>> for simplified management and backup purposes. > >>> > >>> The question I have is on the VM guest, do I really need to run a > raid-z or > >>> mirror or can I just use a single virtual disk (or even a stripe)? > Given > >>> that the underlying storage for the virtual disk is a zvol on a raid-= z > >>> there should not really be too much worry for data corruption, I woul= d > >>> think. It would be equivalent to using a hardware raid for each > component > >>> of my zfs pool. > >>> > >>> Opinions? Preferably well-reasoned ones. :) > >> > >> This is a frustrating situation, because none of the options that I ca= n > >> think of look particularly appealing. Single-vdev pools would be the > >> best option, your redundancy is already taken care of by the host's > >> pool. The overhead of checksumming, etc. twice is probably not super > >> bad. However, having the ARC eating up lots of memory twice seems > >> pretty bletcherous. You can probably do some tuning to reduce that, b= ut > >> I never liked tuning the ARC much. > >> > >> All the nice features ZFS brings to the table is hard to give up once > >> you get used to having them around, so I understand your quandry. > >> > >> Marcus > > > > You can just: > > > > zfs set primarycache=3Dmetadata poolname > > > > And it will only cache metadata in the ARC inside the VM, and avoid > > caching data blocks, which will be cached outside the VM. You could eve= n > > turn the primarycache off entirely. > > > > -- > > Allan Jude > > > On Aug 27, 2015, at 1:20 PM, Paul Vixie wrote: > > > > let me ask a related question: i'm using FFS in the guest, zvol on the > > host. should i be telling my guest kernel to not bother with an FFS > > buffer cache at all, or to use a smaller one, or what? > > > Whether we are talking ffs, ntfs or zpool atop zvol, unfortunately there > are really no simple answers. You must consider your use case, the host a= nd > vm hardware/software configuration, perform meaningful benchmarks and, if > you care about data integrity, thorough tests of the likely failure modes > (all far more easily said than done). I=E2=80=99m curious to hear more ab= out your > use case(s) and setups so as to offer better insight on what alternatives > may make more/less sense for you. Performance needs? Are you striving for > lower individual latency or higher combined throughput? How critical are > integrity and availability? How do you prefer your backup routine? Do you > handle that in guest or host? Want features like dedup and/or L2ARC up in > the mix? (Then everything bears reconsideration, just about triple your > research and testing efforts.) > > Sorry, I=E2=80=99m really not trying to scare anyone away from ZFS. It is= awesome > and capable of providing amazing solutions with very reliable and sensibl= e > behavior if handled with due respect, fear, monitoring and upkeep. :) > > There are cases to be made for caching [meta-]data in the child, in the > parent, checksumming in the child/parent/both, compressing in the > child/parent. I believe `gstat` along with your custom-made benchmark or > test load will greatly help guide you. > > ZFS on ZFS seems to be a hardly studied, seldom reported, never > documented, tedious exercise. Prepare for accelerated greying and balding > of your hair. The parent's volblocksize, child's ashift, alignment, > interactions involving raidz stripes (if used) can lead to problems from > slightly decreased performance and storage efficiency to pathological wri= te > amplification within ZFS, performance and responsiveness crashing and > sinking to the bottom of the ocean. Some datasets can become veritable > black holes to vfs system calls. You may see ZFS reporting elusive errors= , > deadlocking or panicing in the child or parent altogether. With diligence > though, stable and performant setups can be discovered for many productio= n > situations. > > For example, for a zpool (whether used by a VM or not, locally, thru > iscsi, ggate[cd], or whatever) atop zvol which sits on parent zpool with = no > redundancy, I would set primarycache=3Dmetadata checksum=3Doff compressio= n=3Doff > for the zvol(s) on the host(s) and for the most part just use the same > zpool settings and sysctl tunings in the VM (or child zpool, whatever rol= e > it may conduct) that i would otherwise use on bare cpu and bare drives > (defaults + compression=3Dlz4 atime=3Doff). However, that simple case is = likely > not yours. > > With ufs/ffs/ntfs/ext4 and most other filesystems atop a zvol i use > checksums on the parent zvol, and compression too if the child doesn=E2= =80=99t > support it (as ntfs can), but still caching only metadata on the host and > letting the child vm/fs cache real data. > > My use case involves charging customers for their memory use so admittedl= y > that is one motivating factor, LOL. Plus, i certainly don=E2=80=99t want = one rude > VM marching through host ARC unfairly evacuating and starving the other > polite neighbors. > > VM=E2=80=99s swap space becomes another consideration and I treat it like= any > other =E2=80=98dumb=E2=80=99 filesystem with compression and checksumming= done by the > parent but recent versions of many operating systems may be paging out on= ly > already compressed data, so investigate your guest OS. I=E2=80=99ve found= lz4=E2=80=99s > claims of an almost-no-penalty early-abort to be vastly overstated when > dealing with zvols, small block sizes and high throughput so if you can b= e > certain you=E2=80=99ll be dealing with only compressed data then turn it = off. For > the virtual memory pagers in most current-day OS=E2=80=99s though set com= pression > on the swap=E2=80=99s backing zvol to lz4. > > Another factor is the ZIL. One VM can hoard your synchronous write > performance. Solutions are beyond the scope of this already-too-long emai= l > :) but I=E2=80=99d be happy to elaborate if queried. > > And then there=E2=80=99s always netbooting guests from NFS mounts served = by the > host and giving the guest no virtual disks, don=E2=80=99t forget to consi= der that > option. > > Hope this provokes some fruitful ideas for you. Glad to philosophize abou= t > ZFS setups with ya=E2=80=99ll :) > > -chad > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Fri Aug 28 00:12:56 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2B2219C4930 for ; Fri, 28 Aug 2015 00:12:56 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0A2E81BDE for ; Fri, 28 Aug 2015 00:12:55 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 18D023F726 for ; Thu, 27 Aug 2015 20:12:55 -0400 (EDT) Message-ID: <55DFA786.8090809@sneakertech.com> Date: Thu, 27 Aug 2015 20:12:54 -0400 From: Quartz MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Options for zfs inside a VM backed by zfs on the host References: <20150827061044.GA10221@blazingdot.com> <20150827062015.GA10272@blazingdot.com> <1a6745e27d184bb99eca7fdbdc90c8b5@SERVER.ad.usd-group.com> <55DF46F5.4070406@redbarn.org> <453A5A6F-E347-41AE-8CBC-9E0F4DA49D38@ccsys.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Aug 2015 00:12:56 -0000 > I am right now exploring the question: are SSD ZILs necessary in an all SSD > pool? Something mentioned in another recent thread on this list (or maybe it was -questions?) was that yes, you really should consider a separate ZIL if you're using primarily SSDs. Without a separate disk, log writes have to steal blocks from the pool itself which then have to be deleted afterwards to let go of the space. Besides causing excess file fragmentation, the write-delete cycle doesn't play well with SSDs and trim and can seriously hamper performance. With a dedicated disk, it writes and then just leaves it there, only overwriting later if necessary. From owner-freebsd-fs@freebsd.org Fri Aug 28 09:55:09 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6F2639C4C77 for ; Fri, 28 Aug 2015 09:55:09 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com [209.85.212.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0DF291084 for ; Fri, 28 Aug 2015 09:55:08 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by wibcx1 with SMTP id cx1so8688526wib.1 for ; Fri, 28 Aug 2015 02:55:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=tUtRcxOLMWAKZbQQo6yO/eosSCgrcRp0sHuketYr1BI=; b=ZrMIE6ovvRElu+5jvrveohvXFSs6yA9pF9NznJSmSBoznkuoU0A/b114zqZhwMJNSC fW3qpmOC6Wq2G2yDFZpwJLoJ/yAW/yFmS0ytgOajxxU/il6WfAQ6b9iLIk3aDDcDnHXz lFQElFHuKg0In3S6rs42DmGBu+nKFkClEjwhp/EK+H9VDr4F08BrtgkV65z8luWckS1D Ms249zhBVPxHrblX3OPYgtdfIs+xhkMQerOAW0a8sytA6k4F39yNzbo/VscCeSb81uBV bmIMgeIqaE7rXDzImDLSNqlUuzLNd9nnTW7HN30LYmAMVgN4wEQjuP/WxjuQ2Vov/F8i 39Zg== X-Gm-Message-State: ALoCoQlryyfpS2PsxLVjhTW60Ly2jij1tZm3PsI+aldJSST3sOJOwwADDzAOvZHPQ7lxhAdRVvBy X-Received: by 10.194.57.205 with SMTP id k13mr10250051wjq.100.1440755700678; Fri, 28 Aug 2015 02:55:00 -0700 (PDT) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by smtp.gmail.com with ESMTPSA id y13sm7227805wjq.26.2015.08.28.02.54.59 for (version=TLSv1/SSLv3 cipher=OTHER); Fri, 28 Aug 2015 02:54:59 -0700 (PDT) Subject: Re: Panic in ZFS during zfs recv (while snapshots being destroyed) To: freebsd-fs@freebsd.org References: <55BB443E.8040801@denninger.net> <55CF7926.1030901@denninger.net> <55DF7191.2080409@denninger.net> From: Steven Hartland Message-ID: <55E02FF5.2060805@multiplay.co.uk> Date: Fri, 28 Aug 2015 10:55:01 +0100 User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Aug 2015 09:55:09 -0000 You would need to have a very broken TRIM implementation for that to happen, do you have any details on the devices involved? On 27/08/2015 21:30, Sean Chittenden wrote: > Have you tried disabling TRIM? We recently ran in to an issue where a `zfs delete` on a large dataset caused the host to panic because TRIM was tripping over the ZFS deadman timer. Disabling TRIM worked as valid workaround for us. ? You mentioned a recent move to SSDs, so this can happen, esp after the drive has experienced a little bit of actual work. ? -sc > > > -- > Sean Chittenden > sean@chittenden.org > > >> On Aug 27, 2015, at 13:22, Karl Denninger wrote: >> >> On 8/15/2015 12:38, Karl Denninger wrote: >>> Update: >>> >>> This /appears /to be related to attempting to send or receive a >>> /cloned /snapshot. >>> >>> I use /beadm /to manage boot environments and the crashes have all >>> come while send/recv-ing the root pool, which is the one where these >>> clones get created. It is /not /consistent within a given snapshot >>> when it crashes and a second attempt (which does a "recovery" >>> send/receive) succeeds every time -- I've yet to have it panic twice >>> sequentially. >>> >>> I surmise that the problem comes about when a file in the cloned >>> snapshot is modified, but this is a guess at this point. >>> >>> I'm going to try to force replication of the problem on my test system. >>> >>> On 7/31/2015 04:47, Karl Denninger wrote: >>>> I have an automated script that runs zfs send/recv copies to bring a >>>> backup data set into congruence with the running copies nightly. The >>>> source has automated snapshots running on a fairly frequent basis >>>> through zfs-auto-snapshot. >>>> >>>> Recently I have started having a panic show up about once a week during >>>> the backup run, but it's inconsistent. It is in the same place, but I >>>> cannot force it to repeat. >>>> >>>> The trap itself is a page fault in kernel mode in the zfs code at >>>> zfs_unmount_snap(); here's the traceback from the kvm (sorry for the >>>> image link but I don't have a better option right now.) >>>> >>>> I'll try to get a dump, this is a production machine with encrypted swap >>>> so it's not normally turned on. >>>> >>>> Note that the pool that appears to be involved (the backup pool) has >>>> passed a scrub and thus I would assume the on-disk structure is ok..... >>>> but that might be an unfair assumption. It is always occurring in the >>>> same dataset although there are a half-dozen that are sync'd -- if this >>>> one (the first one) successfully completes during the run then all the >>>> rest will as well (that is, whenever I restart the process it has always >>>> failed here.) The source pool is also clean and passes a scrub. >>>> >>>> traceback is at http://www.denninger.net/kvmimage.png; apologies for the >>>> image traceback but this is coming from a remote KVM. >>>> >>>> I first saw this on 10.1-STABLE and it is still happening on FreeBSD >>>> 10.2-PRERELEASE #9 r285890M, which I updated to in an attempt to see if >>>> the problem was something that had been addressed. >>>> >>>> >>> -- >>> Karl Denninger >>> karl@denninger.net >>> /The Market Ticker/ >>> /[S/MIME encrypted email preferred]/ >> Second update: I have now taken another panic on 10.2-Stable, same deal, >> but without any cloned snapshots in the source image. I had thought that >> removing cloned snapshots might eliminate the issue; that is now out the >> window. >> >> It ONLY happens on this one filesystem (the root one, incidentally) >> which is fairly-recently created as I moved this machine from spinning >> rust to SSDs for the OS and root pool -- and only when it is being >> backed up by using zfs send | zfs recv (with the receive going to a >> different pool in the same machine.) I have yet to be able to provoke >> it when using zfs send to copy to a different machine on the same LAN, >> but given that it is not able to be reproduced on demand I can't be >> certain it's timing related (e.g. performance between the two pools in >> question) or just that I haven't hit the unlucky combination. >> >> This looks like some sort of race condition and I will continue to see >> if I can craft a case to make it occur "on demand" >> >> -- >> Karl Denninger >> karl@denninger.net >> /The Market Ticker/ >> /[S/MIME encrypted email preferred]/ > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Fri Aug 28 16:27:32 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3507D9C581A; Fri, 28 Aug 2015 16:27:32 +0000 (UTC) (envelope-from milios@ccsys.com) Received: from cargobay.net (cargobay.net [198.178.123.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F21191A9A; Fri, 28 Aug 2015 16:27:31 +0000 (UTC) (envelope-from milios@ccsys.com) Received: from [192.168.0.2] (cblmdm72-240-160-19.buckeyecom.net [72.240.160.19]) by cargobay.net (Postfix) with ESMTPSA id B594ADE8; Fri, 28 Aug 2015 16:23:36 +0000 (UTC) Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Options for zfs inside a VM backed by zfs on the host From: "Chad J. Milios" In-Reply-To: Date: Fri, 28 Aug 2015 12:27:22 -0400 Cc: freebsd-fs@freebsd.org, "freebsd-virtualization@freebsd.org" Message-Id: <8DB91B3A-44DC-4650-9E90-56F7DE2ABC42@ccsys.com> References: <20150827061044.GA10221@blazingdot.com> <20150827062015.GA10272@blazingdot.com> <1a6745e27d184bb99eca7fdbdc90c8b5@SERVER.ad.usd-group.com> <55DF46F5.4070406@redbarn.org> <453A5A6F-E347-41AE-8CBC-9E0F4DA49D38@ccsys.com> To: Tenzin Lhakhang X-Mailer: Apple Mail (2.2104) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Aug 2015 16:27:32 -0000 > On Aug 27, 2015, at 7:47 PM, Tenzin Lhakhang = wrote: >=20 > On Thu, Aug 27, 2015 at 3:53 PM, Chad J. Milios > wrote: >=20 > Whether we are talking ffs, ntfs or zpool atop zvol, unfortunately = there are really no simple answers. You must consider your use case, the = host and vm hardware/software configuration, perform meaningful = benchmarks and, if you care about data integrity, thorough tests of the = likely failure modes (all far more easily said than done). I=E2=80=99m = curious to hear more about your use case(s) and setups so as to offer = better insight on what alternatives may make more/less sense for you. = Performance needs? Are you striving for lower individual latency or = higher combined throughput? How critical are integrity and availability? = How do you prefer your backup routine? Do you handle that in guest or = host? Want features like dedup and/or L2ARC up in the mix? (Then = everything bears reconsideration, just about triple your research and = testing efforts.) >=20 > Sorry, I=E2=80=99m really not trying to scare anyone away from ZFS. It = is awesome and capable of providing amazing solutions with very reliable = and sensible behavior if handled with due respect, fear, monitoring and = upkeep. :) >=20 > There are cases to be made for caching [meta-]data in the child, in = the parent, checksumming in the child/parent/both, compressing in the = child/parent. I believe `gstat` along with your custom-made benchmark or = test load will greatly help guide you. >=20 > ZFS on ZFS seems to be a hardly studied, seldom reported, never = documented, tedious exercise. Prepare for accelerated greying and = balding of your hair. The parent's volblocksize, child's ashift, = alignment, interactions involving raidz stripes (if used) can lead to = problems from slightly decreased performance and storage efficiency to = pathological write amplification within ZFS, performance and = responsiveness crashing and sinking to the bottom of the ocean. Some = datasets can become veritable black holes to vfs system calls. You may = see ZFS reporting elusive errors, deadlocking or panicing in the child = or parent altogether. With diligence though, stable and performant = setups can be discovered for many production situations. >=20 > For example, for a zpool (whether used by a VM or not, locally, thru = iscsi, ggate[cd], or whatever) atop zvol which sits on parent zpool with = no redundancy, I would set primarycache=3Dmetadata checksum=3Doff = compression=3Doff for the zvol(s) on the host(s) and for the most part = just use the same zpool settings and sysctl tunings in the VM (or child = zpool, whatever role it may conduct) that i would otherwise use on bare = cpu and bare drives (defaults + compression=3Dlz4 atime=3Doff). However, = that simple case is likely not yours. >=20 > With ufs/ffs/ntfs/ext4 and most other filesystems atop a zvol i use = checksums on the parent zvol, and compression too if the child doesn=E2=80= =99t support it (as ntfs can), but still caching only metadata on the = host and letting the child vm/fs cache real data. >=20 > My use case involves charging customers for their memory use so = admittedly that is one motivating factor, LOL. Plus, i certainly don=E2=80= =99t want one rude VM marching through host ARC unfairly evacuating and = starving the other polite neighbors. >=20 > VM=E2=80=99s swap space becomes another consideration and I treat it = like any other =E2=80=98dumb=E2=80=99 filesystem with compression and = checksumming done by the parent but recent versions of many operating = systems may be paging out only already compressed data, so investigate = your guest OS. I=E2=80=99ve found lz4=E2=80=99s claims of an = almost-no-penalty early-abort to be vastly overstated when dealing with = zvols, small block sizes and high throughput so if you can be certain = you=E2=80=99ll be dealing with only compressed data then turn it off. = For the virtual memory pagers in most current-day OS=E2=80=99s though = set compression on the swap=E2=80=99s backing zvol to lz4. >=20 > Another factor is the ZIL. One VM can hoard your synchronous write = performance. Solutions are beyond the scope of this already-too-long = email :) but I=E2=80=99d be happy to elaborate if queried. >=20 > And then there=E2=80=99s always netbooting guests from NFS mounts = served by the host and giving the guest no virtual disks, don=E2=80=99t = forget to consider that option. >=20 > Hope this provokes some fruitful ideas for you. Glad to philosophize = about ZFS setups with ya=E2=80=99ll :) >=20 > -chad > That was a really awesome read! The idea of turning metadata on at = the backend zpool and then data on the VM was interesting, I will give = that a try. Please can you elaborate more on the ZILs and synchronous = writes by VMs.. that seems like a great topic. > I am right now exploring the question: are SSD ZILs necessary in an = all SSD pool? and then the question of NVMe SSD ZILs onto of an all SSD = pool. My guess at the moment is that SSD ZILs are not necessary at all = in an SSD pool during intensive IO. I've been told that ZILs are always = there to help you, but when your pool aggregate IOPs is greater than the = a ZIL, it doesn't seem to make sense.. Or is it the latency of writing = to a single disk vs striping across your "fast" vdevs? >=20 > Thanks, > Tenzin Well the ZIL (ZFS Intent Log) is basically an absolute necessity. = Without it, a call to fsync() could take over 10 seconds on a system = serving a relatively light load. HOWEVER, a source of confusion is the = terminology people often throw around. See, the ZIL is basically a = concept, a method, a procedure. It is not a device. A 'SLOG' is what = most people mean when they say ZIL. That is a Seperate Log device. (ZFS = =E2=80=98log=E2=80=99 vdev type; documented in man 8 zpool.) When you = aren=E2=80=99t using a SLOG device, your ZIL is transparently allocated = by ZFS, roughly a little chunk of space reserved near the =E2=80=9Cmiddle=E2= =80=9D (at least ZFS attempts to locate it there physically but on SSDs = or SMR HDs there=E2=80=99s no way to and no point to) of the main pool = (unless you=E2=80=99ve gone out of your way to deliberately disable the = ZIL entirely). The other confusion often surrounding the ZIL is when it gets used. Most = writes (in the world) would bypass the ZIL (built-in or SLOG) entirely = anyway because they are asynchronous writes, not synchronous ones. Only = the latter are candidates to clog a ZIL bottleneck. You will need to = consider your workload specifically to know whether a SLOG will help, = and if so, how much SLOG performance is required to not put a damper on = the pool=E2=80=99s overall throughput capability. Conversely you want to = know how much SLOG performance is overkill because NVMe and SLC SSDs are = freaking expensive. Now for many on the list this is going to be some elementary information = so i apologize but i come across this question all the time, sync vs = async writes. i=E2=80=99m sure there are many who might find this = informative and with ZFS the difference becomes more profound and = important than most other filesystems. See, ZFS always is always bundling up batches of writes into transaction = groups (TXGs). Without extraneous detail it can be understood that = basically these happen every 5 seconds (sysctl vfs.zfs.txg.timeout). So = picture ZFS typically has two TXGs it=E2=80=99s worried about at any = given time, one is being filled into memory while the previous one is = being flushed out to physical disk. So when you write something asynchronously the operating system is going = to say =E2=80=98aye aye captain=E2=80=99 and send you along your merry = way very quickly but if you lose power or crash and then reboot, ZFS = only guarantees you a CONSISTENT state, not your most recent state. Your = pool may come back online and you=E2=80=99ve lost 5-15 seconds worth of = work. For your typical desktop or workstation workload that=E2=80=99s = probably no big deal. You lost 15 seconds of effort, you repeat it, and = continue about your business. However, imagine a mail server that received many many emails in just = that short time and has told all the senders of all those messages = =E2=80=9Cgot it, thumbs up=E2=80=9D. You cannot redact those assurances = you handed out. You have no idea who to contact to ask to repeat = themselves. Even if you did it's likely the sending mail servers have = long since forgotten about those particular messages. So, with each = message you receive, after you tell the operating system to write the = data you issue a call to fsync(new_message) and only after that call = returns do you give the sender the thumbs up to forget the message and = leave it in your capable hands to deliver it to its destination. Thanks = to the ZIL, fsync() will typically return in miliseconds or less instead = of the many seconds it could take for that write in a bundled TXG to end = up physically saved. In an ideal world, the ZIL gets written to and = never read again, data just becoming stale and overwritten. (The data = stays in the in-memory TXG so it=E2=80=99s redundant in the ZIL once = that TXG completes flushing). The email server is the typical example of the use of fsync but there = are thousands of others. Typically applications using central databases = are written in a simplistic way to assume the database is trustworthy = and fsync is how the database attempts to fulfill that requirement. To complicate matters, consider VMs, particularly uncooperative, = impolite, selfish VMs. Synchronous write iops are a particularly scarce = and expensive resource which hasn=E2=80=99t been increasing as quickly = and cheaply as, say, io bandwidth, cpu speeds, memory capacities. To = make it worse the numbers for iops most SSD makers advertise on their = so-called spec sheets are untrustworthy, they have no standard benchmark = or enforcement (=E2=80=9CThe PS in IOPS stands for Per Second so we ran = our benchmark on a fresh drive for one second and got 100,000 IOPS" = Well, good for you, that is useless to me. Tell me what you can sustain = all day long a year down the road.) and they=E2=80=99re seldom = accountable to anybody not buying 10,000 units. All this consolidation = of VMs/containers/jails can really stress sync i/o capability of even = the biggest baddest servers. And FreeBSD, in all it=E2=80=99s glory is not yet very well suited to = the problem of multi-tennency. (It=E2=80=99s great if all jails and VMs = on a server are owned and controlled by one stakeholder who can = coordinate their friendly coexistence.) My firm develops and supports a = proprietary shim into ZFS and jails for enforcing the polite sharing of = bandwidth, total iops and sync iops, that can be applied to groups of = which the granularity of membership are arbitrary ZFS datasets. So = there, that's my shameless plug, LOL. However there are brighter minds = than I working on this problem and I=E2=80=99m hoping to maybe some time = either participate in a more general development of such facilities with = broader application into mainline FreeBSD or to perhaps open source my = own work eventually. (I guess I=E2=80=99m being more shy than selfish = with it, LOL.) Hope that=E2=80=99s food for thought for some of you -chad= From owner-freebsd-fs@freebsd.org Sat Aug 29 06:28:00 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C98989C50A2 for ; Sat, 29 Aug 2015 06:28:00 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (unknown [IPv6:2001:388:f000::349d]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps.rulingia.com", Issuer "CAcert Class 3 Root" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 58D531FAB for ; Sat, 29 Aug 2015 06:27:59 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from server.rulingia.com (c220-239-242-83.belrs5.nsw.optusnet.com.au [220.239.242.83]) by vps.rulingia.com (8.15.2/8.15.2) with ESMTPS id t7T6Rop3030472 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Sat, 29 Aug 2015 16:27:55 +1000 (AEST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.9/8.14.9) with ESMTP id t7T6Rhs4003014 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sat, 29 Aug 2015 16:27:43 +1000 (AEST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.9/8.14.9/Submit) id t7T6RhVa003013 for freebsd-fs@freebsd.org; Sat, 29 Aug 2015 16:27:43 +1000 (AEST) (envelope-from peter) Date: Sat, 29 Aug 2015 16:27:43 +1000 From: Peter Jeremy To: freebsd-fs@freebsd.org Subject: Panic in zfs_blkptr_verify() Message-ID: <20150829062743.GA2996@server.rulingia.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="ZGiS0Q5IWpPtfppv" Content-Disposition: inline X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender succeeded STARTTLS authentication, not delayed by milter-greylist-4.4.3 (vps.rulingia.com [103.243.244.15]); Sat, 29 Aug 2015 16:27:55 +1000 (AEST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Aug 2015 06:28:01 -0000 --ZGiS0Q5IWpPtfppv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I'm trying to upgrade my main (amd64) server from 10-stable r276177 to r287251 but the new kernel consistently panics: panic: Solaris(panic): blkptr at 0xfffff80015961848 DVA 0 has invalid OFFSE= T 15724224479232 cpuid =3D 2 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe086027a= ff0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe086027b0a0 vpanic() at vpanic+0x126/frame 0xfffffe086027b0e0 panic() at panic+0x43/frame 0xfffffe086027b140 vcmn_err() at vcmn_err+0xcf/frame 0xfffffe086027b270 zfs_panic_recover() at zfs_panic_recover+0x60/frame 0xfffffe086027b2d0 zfs_blkptr_verify() at zfs_blkptr_verify+0x297/frame 0xfffffe086027b310 zio_read() at zio_read+0x2f/frame 0xfffffe086027b3a0 arc_read() at arc_read+0xb1e/frame 0xfffffe086027b450 dmu_objset_open_impl() at dmu_objset_open_impl+0x196/frame 0xfffffe086027b4= e0 dsl_pool_init() at dsl_pool_init+0x2a/frame 0xfffffe086027b510 spa_load() at spa_load+0xa20/frame 0xfffffe086027b650 spa_load_best() at spa_load_best+0x6f/frame 0xfffffe086027b6c0 spa_open_common() at spa_open_common+0x102/frame 0xfffffe086027b730 pool_status_check() at pool_status_check+0x4e/frame 0xfffffe086027b760 zfsdev_ioctl() at zfsdev_ioctl+0x52e/frame 0xfffffe086027b800 devfs_ioctl_f() at devfs_ioctl_f+0x121/frame 0xfffffe086027b860 kern_ioctl() at kern_ioctl+0x160/frame 0xfffffe086027b8c0 sys_ioctl() at sys_ioctl+0x15c/frame 0xfffffe086027b9a0 amd64_syscall() at amd64_syscall+0x22e/frame 0xfffffe086027bab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe086027bab0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip =3D 0x801a1dc5a, rsp =3D 0x= 7fffffffd028, rbp =3D 0x7fffffffd0a0 --- Unfortunately, I'm not sure how to resolve this. zfs_blkptr_verify() was MFH in r277582 so the checks don't exist in my old kernel. This means it could be a problem with one of my pools, rather than a software bug. But there's no information in the panic that would let me identify where the dodgy offset was found other than it's DVA 0 of some undefined blkptr_t. I've tried setting dumpon prior to this point (and I can see that dumpdev is set) but I don't get a crashdump. Since this is my primary server, I'd prefer not to have it down for an extended period whilst I rummaged around with ddb. Any suggestions as to how to proceed? --=20 Peter Jeremy --ZGiS0Q5IWpPtfppv Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQJ8BAEBCgBmBQJV4VDfXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRFRUIyOTg2QzMwNjcxRTc0RTY1QzIyN0Ux NkE1OTdBMEU0QTIwQjM0AAoJEBall6Dkogs03iQP/3vVZyoSKQd/GMKaNMZJBP8P 9h0JAtr8h5byoQiOH4yTitsLnF378HaK67Ay/Uf0uGbITFo28qe5fK1DU4YCBCXA lUIXouCUCK3i20e7NFdA1hb9xxd7zsbJUOvURY8bKD9wRVFKK54hgNcAZQ8c3qmw 9gk5EcJ9I1seAF782iIwgrN/9TuLC06w6q2s/v6Mk0afSyiS5TAN93sLuXbJE1os qXXkIqSBczLKu3HAHNenS0XlfL0RsSLgDvpU94YR68oLoxKmb9Atap7/zzv/QDWZ TNRjgXWBuMBlvcIkNUUhcuXeVdcNEe0q/wrH70PmTTmYNsHRrURfEZFgdxS/T6Ze H5MdQSQDugTsxNws6fasCbsChCIsNpU9CWmRPXpBK2To+V1D5JOLyX9YpaeIpU0v FEj99zB4DSOLL7+vPDxCs5meaiiCjmi3O9ys3GVLB9lHI2wmLMzPwcDXkWhTYvAc R+I7KJXuChnCIzGZ+J3dSPVHX26jNwkhOzaDX2EKq7r+ABKmHCoSpieMKI5ky008 by59Z2o5+27TMGOk3yf+sKOIbLW7xlkenhxDn4lY9BHUjMxid/6x/MIRg+huInKj K9EY4SiVbKgNNxrGAj9hoR4kGrN5mYs3Q3Q4h7WDBgWYgxyU6ybF2F/zewEad9zG SZwGMZJeZ5S+3tScBEqy =N9jw -----END PGP SIGNATURE----- --ZGiS0Q5IWpPtfppv-- From owner-freebsd-fs@freebsd.org Sat Aug 29 10:02:36 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AD0919C5B00 for ; Sat, 29 Aug 2015 10:02:36 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 94E581583 for ; Sat, 29 Aug 2015 10:02:36 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from Xins-MBP.home.us.delphij.net (unknown [IPv6:2601:646:8f00:8a91:f980:c96f:9a9:6dc9]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id C741F1E53F; Sat, 29 Aug 2015 03:02:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1440842556; x=1440856956; bh=UaMtOSL7piIPw56grsr6dG3h5Ze+nCNXgqvJeWgfnks=; h=Subject:To:References:From:Date:In-Reply-To; b=to3amZ1IoKX9jgXboQ+8XIElkjUdgxOgIUCWdPO4vKqYcgGY938HRq9+PXlt82cIc MUgDlOoExPcKxk/dXY9n5UrI1fcFI/TupZbPGQARICn1J4bOIufVrzP1a0mSR/ogc1 05jnPhRw2WsU7ARBMNP0jzK+S304w8s/C/EHcydw= Subject: Re: Panic in zfs_blkptr_verify() To: Peter Jeremy , freebsd-fs@freebsd.org References: <20150829062743.GA2996@server.rulingia.com> From: Xin Li X-Enigmail-Draft-Status: N1110 Message-ID: <55E18337.5030104@delphij.net> Date: Sat, 29 Aug 2015 03:02:31 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150829062743.GA2996@server.rulingia.com> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="nKq141d47nb6RwVjLWKTW7fpQOoopovQb" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Aug 2015 10:02:36 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --nKq141d47nb6RwVjLWKTW7fpQOoopovQb Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 8/28/15 23:27, Peter Jeremy wrote: > I'm trying to upgrade my main (amd64) server from 10-stable r276177 to > r287251 but the new kernel consistently panics: >=20 > Unfortunately, I'm not sure how to resolve this. zfs_blkptr_verify() w= as > MFH in r277582 so the checks don't exist in my old kernel. This means = it > could be a problem with one of my pools, rather than a software bug. B= ut > there's no information in the panic that would let me identify where th= e > dodgy offset was found other than it's DVA 0 of some undefined blkptr_t= =2E Unfortunately, I have to say that the pool may be beyond repair, and you may have to try importing it only and recreate the pool, or destroy it and restore from a backup. One thing worth trying is to set vfs.zfs.recover=3D1 from loader, and import the pool read-only. If this could succeed, you can migrate the as much data as possible out of the pool. You may also want to try zpool import -F (and -F -X if -F didn't work) to see if discarding a few latest txgs would be helpful. > I've tried setting dumpon prior to this point (and I can see that > dumpdev is set) but I don't get a crashdump. Since this is my primary That's probably not very useful because the panic is an assertion that we know the block pointer is bad, and from the backtrace, it looks like it was the toplevel dataset. Cheers, --nKq141d47nb6RwVjLWKTW7fpQOoopovQb Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJV4YM7AAoJEJW2GBstM+nsbeIP/37GDaxtS2wI+WjqFQReYdpv yGGKrA5jpFV0TxaN4dSyEVb3nm1An8rPYwxb6MGRvZSGuF+PadBkXrm/gzKWhRx0 ybg1YK0p5uP3mb1SWsUuWfdGtLOmJKSrVh4QNdI/kNws6oW9rjkKMOvnfzVVol+A 4L+mjLIweeNTIzEs9rY6EYykStd7Ly7oWiS44znqpzBZZom8nq3E1JU84vFGTHiB IHCjoH81p4nVH2jnxkRYGt0dkQTvXPjWTvnhWvuvigwWwNUNiduMB/u6MBPYdBu5 V5I9AHboLZfD/Sx5A79c84x7TDyjmDCY/9MOYNkwBJ/erBpH5GSXYwibI9MS/HNl rhrY/4wnNH/1w+0qyo0iFfWHjvd2PPa1fOFX9rQ8wyO4/P8pdHZTZrXU9FiWSXna VSavxQRqLgvf9XSQzhwHQVVhoBOYkkjWC7MOfDx31Rv+1uUdkfqGyVX+nE/Qu5YF +qJdlEPJjvzQPNiMLxZtX3qAezTcP9HbUaBS962eV5WDRw5Jeupum6qFXlMOg2Rm b6pesy6pkV58VIxU4MdY5ZD4i/cdY+AOPxSrAzWAFCROfkvczbjApKhLDyuNnuMX Sv0SnkNnmEwIvZ5RYOGhQ8k68V6j3VAhoexY+EBo9FnxPR5NruXMgQ7/ctn0jLfR Qn1TZKYTQbHRf0D9EI6n =P/78 -----END PGP SIGNATURE----- --nKq141d47nb6RwVjLWKTW7fpQOoopovQb-- From owner-freebsd-fs@freebsd.org Sat Aug 29 14:55:07 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CEA129C43F2 for ; Sat, 29 Aug 2015 14:55:07 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from mail.in-addr.com (mail.in-addr.com [IPv6:2a01:4f8:191:61e8::2525:2525]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 924991C15 for ; Sat, 29 Aug 2015 14:55:07 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from gjp by mail.in-addr.com with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1ZVhX6-000F0C-I7; Sat, 29 Aug 2015 15:55:04 +0100 Date: Sat, 29 Aug 2015 15:55:04 +0100 From: Gary Palmer To: Quartz Cc: freebsd-fs@freebsd.org Subject: Re: Options for zfs inside a VM backed by zfs on the host Message-ID: <20150829145504.GA99821@in-addr.com> References: <20150827061044.GA10221@blazingdot.com> <20150827062015.GA10272@blazingdot.com> <1a6745e27d184bb99eca7fdbdc90c8b5@SERVER.ad.usd-group.com> <55DF46F5.4070406@redbarn.org> <453A5A6F-E347-41AE-8CBC-9E0F4DA49D38@ccsys.com> <55DFA786.8090809@sneakertech.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55DFA786.8090809@sneakertech.com> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on mail.in-addr.com); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Aug 2015 14:55:07 -0000 On Thu, Aug 27, 2015 at 08:12:54PM -0400, Quartz wrote: > > I am right now exploring the question: are SSD ZILs necessary in an all SSD > > pool? > > Something mentioned in another recent thread on this list (or maybe it > was -questions?) was that yes, you really should consider a separate ZIL > if you're using primarily SSDs. Without a separate disk, log writes have > to steal blocks from the pool itself which then have to be deleted > afterwards to let go of the space. Besides causing excess file > fragmentation, the write-delete cycle doesn't play well with SSDs and > trim and can seriously hamper performance. With a dedicated disk, it > writes and then just leaves it there, only overwriting later if necessary. Presumably they're only necessary if you're dealing with sync writes? If the vast majority of your workload is async writes, does a separate ZIL SSD still help? And I still am curious why ZFS has no stats for letting you measure sync vs async writes Gary