From owner-freebsd-fs@freebsd.org Fri Sep 21 00:27:06 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 02D3110AD9BE for ; Fri, 21 Sep 2018 00:27:06 +0000 (UTC) (envelope-from jgitlin@goboomtown.com) Received: from mail-io1-xd2a.google.com (mail-io1-xd2a.google.com [IPv6:2607:f8b0:4864:20::d2a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7919E8642A for ; Fri, 21 Sep 2018 00:27:05 +0000 (UTC) (envelope-from jgitlin@goboomtown.com) Received: by mail-io1-xd2a.google.com with SMTP id y3-v6so10458571ioc.5 for ; Thu, 20 Sep 2018 17:27:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=goboomtown-com.20150623.gappssmtp.com; s=20150623; h=from:mime-version:subject:message-id:date:to; bh=b1XR1IgkzmH2EpotrT7rGR0VTn7+lOQ5S4Wr6qguxtA=; b=r6qZeYl6bF5EL6HcpKl34RLs/86kB8goU+EOOcyYnhzkuuw+KqiPAJtmuE/am3FXYY DMFGgsX/asOhXFyy/jtHFTFfOXiiY9411o47fq8iIRFnG69pBgUIJwMwe8nh6pkzUqNc QYFr8Il6NPzpUiriplbytE6XOTjrnLwewOHk0z1uTZcR2WXlPskO9H69zi+lC9knXgYD LmETtPX9afmo9BHeLX9EEyANaMKDkNDFisWku14W7qk8XKjnpWLLm917WEuAbZnNt1CI D3HzFE31plW6eY+IfTuitwGH7nRxr3JPY0I1bXm8fUidp5nxw6lYO2QvAcngQkywJF6i 8XgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:mime-version:subject:message-id:date:to; bh=b1XR1IgkzmH2EpotrT7rGR0VTn7+lOQ5S4Wr6qguxtA=; b=reP8nt5KPJ9wpa+QLKBmvI3ygCVX7ybvpABe7dfTukFBko0vMq04xhKHBmKuswrMR1 64TQK/BCMBoem40WpARPeWbAw8ETnD+hXuwTLtayPf2+6UAI7vgbRUcFXcDkRdXNT+Pb a5Pg7Cw7tKE3RGG76/OtI/P/7NJJ5aK/Iepqn8m22+kxgKK6oKkuFdJmpc3BYfi5G1ET l4MpAP0CbGcxr5WcSGZhhKrTXo4UGMtM4V+XjDJGH6y7uTlBJbFhtE318sD6sbOORtTQ FPZOnwvWJSowwU7WUq+UGO2WmyiWSf26kt+vNAAVnc8ys0HKJQYcX+AZtgahZwSeOPUw fmKQ== X-Gm-Message-State: APzg51B45BZ0pxq0hWa8EoTupMfSvVPChLc/pZIPKalmiMz9Y17nf+u7 Qd9Mn/WFrNw6pzzUCGz7xAhn6NjEQNu2lg== X-Google-Smtp-Source: ANB0VdakMGkQSHmSWH0PQyfDSJt869/vUap2LUNShlUryTlvv7gwf6eRwpBtFcf8bz4N82N5MD2hew== X-Received: by 2002:a6b:9bd1:: with SMTP id d200-v6mr38094388ioe.147.1537489624323; Thu, 20 Sep 2018 17:27:04 -0700 (PDT) Received: from yyz.farcry.sitepalette.com (047-135-146-236.res.spectrum.com. [47.135.146.236]) by smtp.gmail.com with ESMTPSA id w8-v6sm1528977itb.0.2018.09.20.17.27.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 20 Sep 2018 17:27:03 -0700 (PDT) From: Josh Gitlin Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Subject: Troubleshooting kernel panic with zfs Message-Id: Date: Thu, 20 Sep 2018 20:27:02 -0400 To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.3445.9.1) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2018 00:27:06 -0000 I am working to debug/troubleshoot a kernel panic with a FreeBSD ZFS = iSCSI server, specifically trying to determine if it's a bug or (more = likely) a misconfiguration in our settings. Server is running = 11.2-RELEASE-p2 with 15.6 GiB of RAM and has a single zpool with 4x2 = mirrored vdevs, 2x mirrored zil and 2x l2arc. Server runs pretty much = nothing other than SSH and iSCSI (via ctld) and serves VM virtual disks = to hypervisor servers over 10gbe LAN. The server experienced a kernel panic and we unfortunately did not have = dumpdev set in /etc/rc.conf (we have since corrected this) so the only = info I have is what was on the screen before I rebooted it. (Because = it's a production system I couldn't mess around and had to reboot ASAP) trap number =3D 12 panic: page fault cpuid =3D 6 KDB: stack backtrace: #0 0xffffffff80b3d567 at kdb_backtrace+0x67 #1 0xffffffff80af6b07 at vpanic+0x177 #2 0xffffffff80af6983 at panic+0x43 #3 0xffffffff80f77fcf at trap_fatal+0x35f #4 0xffffffff80f78029 at trap_pfault+0x49 #5 0xffffffff80f777f7 at trap+0x2c7 #6 0xffffffff80f57dac at calltrap+0x8 #7 0xffffffff80dee7e2 at kmem_back+0xf2 #8 0xffffffff80dee6c0 at kmem_malloc+0x60 #9 0xffffffff80de6172 at keg_alloc_slab+0xe2 #10 0xffffffff80de8b7e at keg_fetch_slab+0x14e #11 0xffffffff80de8364 at zone_fetch_slab+0x64 #12 0xffffffff80de848f at zone_import+0x3f #13 0xffffffff80de4b99 at uma_zalloc_arg+0x3d9 #14 0xffffffff826e6ab2 at zio_write_compress+0x1e2 #15 0xffffffff826e574c at zio_execute+0xac #16 0xffffffff80bled74 at taskqueue_run_locked+0x154 #17 0xffffffff80b4fed8 at taskqueue_thread_loop+0x98 Uptime: 18d18h31m6s mpr0: Sending StopUnit: path (xpt0:mpr0:0:10:ffffffff): handle 10=20 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path (xpt0:mpr0:0:13:ffffffff): handle 13=20 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path Ixpt0:mpr0:0:16:ffffffff): handle 16=20 mpr0: Incrementing SSU count My hunch is that, given this was inside kmem_malloc, we were unable to = allocate memory for a zio_write_compress call (the pool does have ZFS = compression on) and hence this is a tuning issue and not a bug... but I = am looking for confirmation and/or suggested changes/troubleshooting = steps. The ZFS tuning configuration has been stable for years, to it may = be a change in behavior or traffic... If this looks like it might be a = bug, I will be able to get more information from a minidump if it = reoccurs and can follow up on this thread. Any advice or suggestions are welcome! [jgitlin@zfs3 ~]$ zpool status pool: srv state: ONLINE scan: scrub repaired 0 in 2h32m with 0 errors on Tue Sep 11 20:32:18 = 2018 config: NAME STATE READ WRITE CKSUM srv ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/s5 ONLINE 0 0 0 gpt/s9 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gpt/s6 ONLINE 0 0 0 gpt/s10 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 gpt/s7 ONLINE 0 0 0 gpt/s11 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 gpt/s8 ONLINE 0 0 0 gpt/s12 ONLINE 0 0 0 logs mirror-4 ONLINE 0 0 0 gpt/s2-zil ONLINE 0 0 0 gpt/s3-zil ONLINE 0 0 0 cache gpt/s2-cache ONLINE 0 0 0 gpt/s3-cache ONLINE 0 0 0 errors: No known data errors ZFS tuning: vfs.zfs.delay_min_dirty_percent=3D90 vfs.zfs.dirty_data_max=3D4294967296 vfs.zfs.dirty_data_sync=3D3221225472 vfs.zfs.prefetch_disable=3D1 vfs.zfs.top_maxinflight=3D128 vfs.zfs.trim.txg_delay=3D8 vfs.zfs.txg.timeout=3D20 vfs.zfs.vdev.aggregation_limit=3D524288 vfs.zfs.vdev.scrub_max_active=3D3 vfs.zfs.l2arc_write_boost=3D134217728 vfs.zfs.l2arc_write_max=3D134217728 vfs.zfs.l2arc_feed_min_ms=3D200 vfs.zfs.min_auto_ashift=3D12 -- =09 Josh Gitlin Senior DevOps Engineer (415) 690-1610 x155 Stay up to date and join the conversation in Relay = .