FreeBSD Mail Archives

Date:      Thu, 20 Sep 2018 20:27:02 -0400
From:      Josh Gitlin <jgitlin@goboomtown.com>
To:        freebsd-fs@freebsd.org
Subject:   Troubleshooting kernel panic with zfs
Message-ID:  <D54225AD-CC96-45E7-A203-D2C52E984963@goboomtown.com>

index | next in thread | raw e-mail


I am working to debug/troubleshoot a kernel panic with a FreeBSD ZFS iSCSI server, specifically trying to determine if it's a bug or (more likely) a misconfiguration in our settings. Server is running 11.2-RELEASE-p2 with 15.6 GiB of RAM and has a single zpool with 4x2 mirrored vdevs, 2x mirrored zil and 2x l2arc. Server runs pretty much nothing other than SSH and iSCSI (via ctld) and serves VM virtual disks to hypervisor servers over 10gbe LAN.

The server experienced a kernel panic and we unfortunately did not have dumpdev set in /etc/rc.conf (we have since corrected this) so the only info I have is what was on the screen before I rebooted it. (Because it's a production system I couldn't mess around and had to reboot ASAP)

trap number = 12
panic: page fault
cpuid = 6
KDB: stack backtrace:
#0 0xffffffff80b3d567 at kdb_backtrace+0x67
#1 0xffffffff80af6b07 at vpanic+0x177
#2 0xffffffff80af6983 at panic+0x43
#3 0xffffffff80f77fcf at trap_fatal+0x35f
#4 0xffffffff80f78029 at trap_pfault+0x49
#5 0xffffffff80f777f7 at trap+0x2c7
#6 0xffffffff80f57dac at calltrap+0x8
#7 0xffffffff80dee7e2 at kmem_back+0xf2
#8 0xffffffff80dee6c0 at kmem_malloc+0x60
#9 0xffffffff80de6172 at keg_alloc_slab+0xe2
#10 0xffffffff80de8b7e at keg_fetch_slab+0x14e
#11 0xffffffff80de8364 at zone_fetch_slab+0x64
#12 0xffffffff80de848f at zone_import+0x3f
#13 0xffffffff80de4b99 at uma_zalloc_arg+0x3d9
#14 0xffffffff826e6ab2 at zio_write_compress+0x1e2
#15 0xffffffff826e574c at zio_execute+0xac
#16 0xffffffff80bled74 at taskqueue_run_locked+0x154
#17 0xffffffff80b4fed8 at taskqueue_thread_loop+0x98
Uptime: 18d18h31m6s
mpr0: Sending StopUnit: path (xpt0:mpr0:0:10:ffffffff): handle 10 
mpr0: Incrementing SSU count
mpr0: Sending StopUnit: path (xpt0:mpr0:0:13:ffffffff): handle 13 
mpr0: Incrementing SSU count
mpr0: Sending StopUnit: path Ixpt0:mpr0:0:16:ffffffff): handle 16 
mpr0: Incrementing SSU count

My hunch is that, given this was inside kmem_malloc, we were unable to allocate memory for a zio_write_compress call (the pool does have ZFS compression on) and hence this is a tuning issue and not a bug... but I am looking for confirmation and/or suggested changes/troubleshooting steps. The ZFS tuning configuration has been stable for years, to it may be a change in behavior or traffic... If this looks like it might be a bug, I will be able to get more information from a minidump if it reoccurs and can follow up on this thread.

Any advice or suggestions are welcome!

[jgitlin@zfs3 ~]$ zpool status
  pool: srv
 state: ONLINE
  scan: scrub repaired 0 in 2h32m with 0 errors on Tue Sep 11 20:32:18 2018
config:

	NAME            STATE     READ WRITE CKSUM
	srv             ONLINE       0     0     0
	  mirror-0      ONLINE       0     0     0
	    gpt/s5      ONLINE       0     0     0
	    gpt/s9      ONLINE       0     0     0
	  mirror-1      ONLINE       0     0     0
	    gpt/s6      ONLINE       0     0     0
	    gpt/s10     ONLINE       0     0     0
	  mirror-2      ONLINE       0     0     0
	    gpt/s7      ONLINE       0     0     0
	    gpt/s11     ONLINE       0     0     0
	  mirror-3      ONLINE       0     0     0
	    gpt/s8      ONLINE       0     0     0
	    gpt/s12     ONLINE       0     0     0
	logs
	  mirror-4      ONLINE       0     0     0
	    gpt/s2-zil  ONLINE       0     0     0
	    gpt/s3-zil  ONLINE       0     0     0
	cache
	  gpt/s2-cache  ONLINE       0     0     0
	  gpt/s3-cache  ONLINE       0     0     0

errors: No known data errors

ZFS tuning:

vfs.zfs.delay_min_dirty_percent=90
vfs.zfs.dirty_data_max=4294967296
vfs.zfs.dirty_data_sync=3221225472
vfs.zfs.prefetch_disable=1
vfs.zfs.top_maxinflight=128
vfs.zfs.trim.txg_delay=8
vfs.zfs.txg.timeout=20
vfs.zfs.vdev.aggregation_limit=524288
vfs.zfs.vdev.scrub_max_active=3
vfs.zfs.l2arc_write_boost=134217728
vfs.zfs.l2arc_write_max=134217728
vfs.zfs.l2arc_feed_min_ms=200
vfs.zfs.min_auto_ashift=12


--
 <http://www.goboomtown.com/>;	
Josh Gitlin
Senior DevOps Engineer
(415) 690-1610 x155

Stay up to date and join the conversation in Relay <http://relay.goboomtown.com/>.

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D54225AD-CC96-45E7-A203-D2C52E984963>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation