Date: Thu, 20 Sep 2018 20:27:02 -0400 From: Josh Gitlin <jgitlin@goboomtown.com> To: freebsd-fs@freebsd.org Subject: Troubleshooting kernel panic with zfs Message-ID: <D54225AD-CC96-45E7-A203-D2C52E984963@goboomtown.com>
next in thread | raw e-mail | index | archive | help
I am working to debug/troubleshoot a kernel panic with a FreeBSD ZFS = iSCSI server, specifically trying to determine if it's a bug or (more = likely) a misconfiguration in our settings. Server is running = 11.2-RELEASE-p2 with 15.6 GiB of RAM and has a single zpool with 4x2 = mirrored vdevs, 2x mirrored zil and 2x l2arc. Server runs pretty much = nothing other than SSH and iSCSI (via ctld) and serves VM virtual disks = to hypervisor servers over 10gbe LAN. The server experienced a kernel panic and we unfortunately did not have = dumpdev set in /etc/rc.conf (we have since corrected this) so the only = info I have is what was on the screen before I rebooted it. (Because = it's a production system I couldn't mess around and had to reboot ASAP) trap number =3D 12 panic: page fault cpuid =3D 6 KDB: stack backtrace: #0 0xffffffff80b3d567 at kdb_backtrace+0x67 #1 0xffffffff80af6b07 at vpanic+0x177 #2 0xffffffff80af6983 at panic+0x43 #3 0xffffffff80f77fcf at trap_fatal+0x35f #4 0xffffffff80f78029 at trap_pfault+0x49 #5 0xffffffff80f777f7 at trap+0x2c7 #6 0xffffffff80f57dac at calltrap+0x8 #7 0xffffffff80dee7e2 at kmem_back+0xf2 #8 0xffffffff80dee6c0 at kmem_malloc+0x60 #9 0xffffffff80de6172 at keg_alloc_slab+0xe2 #10 0xffffffff80de8b7e at keg_fetch_slab+0x14e #11 0xffffffff80de8364 at zone_fetch_slab+0x64 #12 0xffffffff80de848f at zone_import+0x3f #13 0xffffffff80de4b99 at uma_zalloc_arg+0x3d9 #14 0xffffffff826e6ab2 at zio_write_compress+0x1e2 #15 0xffffffff826e574c at zio_execute+0xac #16 0xffffffff80bled74 at taskqueue_run_locked+0x154 #17 0xffffffff80b4fed8 at taskqueue_thread_loop+0x98 Uptime: 18d18h31m6s mpr0: Sending StopUnit: path (xpt0:mpr0:0:10:ffffffff): handle 10=20 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path (xpt0:mpr0:0:13:ffffffff): handle 13=20 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path Ixpt0:mpr0:0:16:ffffffff): handle 16=20 mpr0: Incrementing SSU count My hunch is that, given this was inside kmem_malloc, we were unable to = allocate memory for a zio_write_compress call (the pool does have ZFS = compression on) and hence this is a tuning issue and not a bug... but I = am looking for confirmation and/or suggested changes/troubleshooting = steps. The ZFS tuning configuration has been stable for years, to it may = be a change in behavior or traffic... If this looks like it might be a = bug, I will be able to get more information from a minidump if it = reoccurs and can follow up on this thread. Any advice or suggestions are welcome! [jgitlin@zfs3 ~]$ zpool status pool: srv state: ONLINE scan: scrub repaired 0 in 2h32m with 0 errors on Tue Sep 11 20:32:18 = 2018 config: NAME STATE READ WRITE CKSUM srv ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/s5 ONLINE 0 0 0 gpt/s9 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gpt/s6 ONLINE 0 0 0 gpt/s10 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 gpt/s7 ONLINE 0 0 0 gpt/s11 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 gpt/s8 ONLINE 0 0 0 gpt/s12 ONLINE 0 0 0 logs mirror-4 ONLINE 0 0 0 gpt/s2-zil ONLINE 0 0 0 gpt/s3-zil ONLINE 0 0 0 cache gpt/s2-cache ONLINE 0 0 0 gpt/s3-cache ONLINE 0 0 0 errors: No known data errors ZFS tuning: vfs.zfs.delay_min_dirty_percent=3D90 vfs.zfs.dirty_data_max=3D4294967296 vfs.zfs.dirty_data_sync=3D3221225472 vfs.zfs.prefetch_disable=3D1 vfs.zfs.top_maxinflight=3D128 vfs.zfs.trim.txg_delay=3D8 vfs.zfs.txg.timeout=3D20 vfs.zfs.vdev.aggregation_limit=3D524288 vfs.zfs.vdev.scrub_max_active=3D3 vfs.zfs.l2arc_write_boost=3D134217728 vfs.zfs.l2arc_write_max=3D134217728 vfs.zfs.l2arc_feed_min_ms=3D200 vfs.zfs.min_auto_ashift=3D12 -- <http://www.goboomtown.com/>=09 Josh Gitlin Senior DevOps Engineer (415) 690-1610 x155 Stay up to date and join the conversation in Relay = <http://relay.goboomtown.com/>.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D54225AD-CC96-45E7-A203-D2C52E984963>