Date: Thu, 20 Sep 2018 20:27:02 -0400 From: Josh Gitlin <jgitlin@goboomtown.com> To: freebsd-fs@freebsd.org Subject: Troubleshooting kernel panic with zfs Message-ID: <D54225AD-CC96-45E7-A203-D2C52E984963@goboomtown.com>
index | next in thread | raw e-mail
I am working to debug/troubleshoot a kernel panic with a FreeBSD ZFS iSCSI server, specifically trying to determine if it's a bug or (more likely) a misconfiguration in our settings. Server is running 11.2-RELEASE-p2 with 15.6 GiB of RAM and has a single zpool with 4x2 mirrored vdevs, 2x mirrored zil and 2x l2arc. Server runs pretty much nothing other than SSH and iSCSI (via ctld) and serves VM virtual disks to hypervisor servers over 10gbe LAN. The server experienced a kernel panic and we unfortunately did not have dumpdev set in /etc/rc.conf (we have since corrected this) so the only info I have is what was on the screen before I rebooted it. (Because it's a production system I couldn't mess around and had to reboot ASAP) trap number = 12 panic: page fault cpuid = 6 KDB: stack backtrace: #0 0xffffffff80b3d567 at kdb_backtrace+0x67 #1 0xffffffff80af6b07 at vpanic+0x177 #2 0xffffffff80af6983 at panic+0x43 #3 0xffffffff80f77fcf at trap_fatal+0x35f #4 0xffffffff80f78029 at trap_pfault+0x49 #5 0xffffffff80f777f7 at trap+0x2c7 #6 0xffffffff80f57dac at calltrap+0x8 #7 0xffffffff80dee7e2 at kmem_back+0xf2 #8 0xffffffff80dee6c0 at kmem_malloc+0x60 #9 0xffffffff80de6172 at keg_alloc_slab+0xe2 #10 0xffffffff80de8b7e at keg_fetch_slab+0x14e #11 0xffffffff80de8364 at zone_fetch_slab+0x64 #12 0xffffffff80de848f at zone_import+0x3f #13 0xffffffff80de4b99 at uma_zalloc_arg+0x3d9 #14 0xffffffff826e6ab2 at zio_write_compress+0x1e2 #15 0xffffffff826e574c at zio_execute+0xac #16 0xffffffff80bled74 at taskqueue_run_locked+0x154 #17 0xffffffff80b4fed8 at taskqueue_thread_loop+0x98 Uptime: 18d18h31m6s mpr0: Sending StopUnit: path (xpt0:mpr0:0:10:ffffffff): handle 10 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path (xpt0:mpr0:0:13:ffffffff): handle 13 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path Ixpt0:mpr0:0:16:ffffffff): handle 16 mpr0: Incrementing SSU count My hunch is that, given this was inside kmem_malloc, we were unable to allocate memory for a zio_write_compress call (the pool does have ZFS compression on) and hence this is a tuning issue and not a bug... but I am looking for confirmation and/or suggested changes/troubleshooting steps. The ZFS tuning configuration has been stable for years, to it may be a change in behavior or traffic... If this looks like it might be a bug, I will be able to get more information from a minidump if it reoccurs and can follow up on this thread. Any advice or suggestions are welcome! [jgitlin@zfs3 ~]$ zpool status pool: srv state: ONLINE scan: scrub repaired 0 in 2h32m with 0 errors on Tue Sep 11 20:32:18 2018 config: NAME STATE READ WRITE CKSUM srv ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/s5 ONLINE 0 0 0 gpt/s9 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gpt/s6 ONLINE 0 0 0 gpt/s10 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 gpt/s7 ONLINE 0 0 0 gpt/s11 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 gpt/s8 ONLINE 0 0 0 gpt/s12 ONLINE 0 0 0 logs mirror-4 ONLINE 0 0 0 gpt/s2-zil ONLINE 0 0 0 gpt/s3-zil ONLINE 0 0 0 cache gpt/s2-cache ONLINE 0 0 0 gpt/s3-cache ONLINE 0 0 0 errors: No known data errors ZFS tuning: vfs.zfs.delay_min_dirty_percent=90 vfs.zfs.dirty_data_max=4294967296 vfs.zfs.dirty_data_sync=3221225472 vfs.zfs.prefetch_disable=1 vfs.zfs.top_maxinflight=128 vfs.zfs.trim.txg_delay=8 vfs.zfs.txg.timeout=20 vfs.zfs.vdev.aggregation_limit=524288 vfs.zfs.vdev.scrub_max_active=3 vfs.zfs.l2arc_write_boost=134217728 vfs.zfs.l2arc_write_max=134217728 vfs.zfs.l2arc_feed_min_ms=200 vfs.zfs.min_auto_ashift=12 -- <http://www.goboomtown.com/> Josh Gitlin Senior DevOps Engineer (415) 690-1610 x155 Stay up to date and join the conversation in Relay <http://relay.goboomtown.com/>.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D54225AD-CC96-45E7-A203-D2C52E984963>
