From owner-freebsd-stable@freebsd.org Sat Feb 4 13:27:01 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BB089CCE955 for ; Sat, 4 Feb 2017 13:27:01 +0000 (UTC) (envelope-from xenophon@irtnog.org) Received: from mx1.irtnog.org (rrcs-24-123-13-61.central.biz.rr.com [24.123.13.61]) by mx1.freebsd.org (Postfix) with ESMTP id 8A31E1C99 for ; Sat, 4 Feb 2017 13:27:00 +0000 (UTC) (envelope-from xenophon@irtnog.org) Received: from uxeprdbsdmx01.irtnog.net (localhost [127.0.0.1]) by mx1.irtnog.org (Postfix) with ESMTP id C51CD1C8559 for ; Sat, 4 Feb 2017 08:18:31 -0500 (EST) X-Virus-Scanned: amavisd-new at irtnog.org Received: from mx1.irtnog.org ([127.0.0.1]) by uxeprdbsdmx01.irtnog.net (mx1.irtnog.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CP8f97H5HAPr for ; Sat, 4 Feb 2017 08:18:29 -0500 (EST) Received: from cinip100ntsbs.irtnog.net (cinip100ntsbs.irtnog.net [10.63.1.100]) by mx1.irtnog.org (Postfix) with ESMTP for ; Sat, 4 Feb 2017 08:18:29 -0500 (EST) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5 Subject: Swapping from a zvol results in a deadman panic Date: Sat, 4 Feb 2017 08:18:28 -0500 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Swapping from a zvol results in a deadman panic Thread-Index: AdJ+6Q1Qu8mx9sRTTwumuchR8Q4d8g== From: "Matthew X. Economou" To: X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Feb 2017 13:27:01 -0000 My FreeBSD 10.3-RELEASE-p16 server crashes in the middle of a Poudriere bulk run (see below). This crash happens even if I lower vfs.zfs.arc_max or tweak vm.v_free_min/target/reserved/severe. I'm looking for configuration advice in case I missed something obvious, since this seems to work on Illumos- and Linux-derived O/Ses, but failing that, I'd like to get some advice as to how to go about debugging this. I doubt the deadman timer causes the system to stop responding. It's more likely a race condition elsewhere. The pool itself uses 4k sectors and is geli-encrypted. I configured the swap zvol based on root-on-ZFS install instructions found in the FreeBSD wiki: zfs create -V 6G -o org.freebsd:swap=3Don -o checksum=3Doff -o compression=3Doff -o dedup=3Doff -o sync=3Ddisabled -o = primarycache=3Dnone zroot/swap The ZoL wiki recommends a slightly different zvol configuration: zfs create -V 4G -b $(getconf PAGESIZE) -o logbias=3Dthroughput -o sync=3Dalways -o primarycache=3Dmetadata -o = com.sun:auto-snapshot=3Dfalse rpool/swap I'm not sure how much of this applies to FreeBSD due to differences in kernel design/implementation. Does anyone have an idea of what might be going on and how I might get this working? last pid: 35097; load averages: 0.54, 4.38, 5.99 up 0+05:23:35 03:27:19 94 processes: 1 running, 89 sleeping, 4 waiting CPU: 0.1% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.9% idle Mem: 911M Active, 1983M Inact, 979M Wired, 772K Cache, 320K Buf, 14M Free ARC: 220M Total, 12M MFU, 45M MRU, 34M Anon, 6645K Header, 122M Other Swap: 6144M Total, 574M Used, 5570M Free, 9% Inuse panic: I/O to pool 'zroot' appears to be hung on vdev guid 13314812526404996608 at '/dev/da0p3.eli'. cpuid =3D 0 KDB: stack backtrace: #0 0xffffffff8098e3e0 at kdb_backtrace+0x60 #1 0xffffffff809510b6 at vpanic+0x126 #2 0xffffffff80950f83 at panic+0x43 #3 0xffffffff81a3ddd3 at vdev_deadman+0x123 #4 0xffffffff81a3dce0 at vdev_deadman+0x30 #5 0xffffffff81a3dce0 at vdev_deadman+0x30 #6 0xffffffff81a325a5 at spa_deadman+0x85 #7 0xffffffff80966c2b at softclock_call_cc+0x17b #8 0xffffffff80967054 at softclock+0x94 #9 0xffffffff8091c9eb at intr_event_execute_handlers+0xab #10 0xffffffff8091ce36 at ithread_loop+0x96 #11 0xffffffff8091a53a at fork_exit+0x9a #12 0xffffffff80d3be0e at fork_trampoline+0xe Uptime: 1h8m24s --=20 "The lyf so short, the craft so longe to lerne."