Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Sep 2018 20:27:02 -0400
From:      Josh Gitlin <jgitlin@goboomtown.com>
To:        freebsd-fs@freebsd.org
Subject:   Troubleshooting kernel panic with zfs
Message-ID:  <D54225AD-CC96-45E7-A203-D2C52E984963@goboomtown.com>

next in thread | raw e-mail | index | archive | help
I am working to debug/troubleshoot a kernel panic with a FreeBSD ZFS =
iSCSI server, specifically trying to determine if it's a bug or (more =
likely) a misconfiguration in our settings. Server is running =
11.2-RELEASE-p2 with 15.6 GiB of RAM and has a single zpool with 4x2 =
mirrored vdevs, 2x mirrored zil and 2x l2arc. Server runs pretty much =
nothing other than SSH and iSCSI (via ctld) and serves VM virtual disks =
to hypervisor servers over 10gbe LAN.

The server experienced a kernel panic and we unfortunately did not have =
dumpdev set in /etc/rc.conf (we have since corrected this) so the only =
info I have is what was on the screen before I rebooted it. (Because =
it's a production system I couldn't mess around and had to reboot ASAP)

trap number =3D 12
panic: page fault
cpuid =3D 6
KDB: stack backtrace:
#0 0xffffffff80b3d567 at kdb_backtrace+0x67
#1 0xffffffff80af6b07 at vpanic+0x177
#2 0xffffffff80af6983 at panic+0x43
#3 0xffffffff80f77fcf at trap_fatal+0x35f
#4 0xffffffff80f78029 at trap_pfault+0x49
#5 0xffffffff80f777f7 at trap+0x2c7
#6 0xffffffff80f57dac at calltrap+0x8
#7 0xffffffff80dee7e2 at kmem_back+0xf2
#8 0xffffffff80dee6c0 at kmem_malloc+0x60
#9 0xffffffff80de6172 at keg_alloc_slab+0xe2
#10 0xffffffff80de8b7e at keg_fetch_slab+0x14e
#11 0xffffffff80de8364 at zone_fetch_slab+0x64
#12 0xffffffff80de848f at zone_import+0x3f
#13 0xffffffff80de4b99 at uma_zalloc_arg+0x3d9
#14 0xffffffff826e6ab2 at zio_write_compress+0x1e2
#15 0xffffffff826e574c at zio_execute+0xac
#16 0xffffffff80bled74 at taskqueue_run_locked+0x154
#17 0xffffffff80b4fed8 at taskqueue_thread_loop+0x98
Uptime: 18d18h31m6s
mpr0: Sending StopUnit: path (xpt0:mpr0:0:10:ffffffff): handle 10=20
mpr0: Incrementing SSU count
mpr0: Sending StopUnit: path (xpt0:mpr0:0:13:ffffffff): handle 13=20
mpr0: Incrementing SSU count
mpr0: Sending StopUnit: path Ixpt0:mpr0:0:16:ffffffff): handle 16=20
mpr0: Incrementing SSU count

My hunch is that, given this was inside kmem_malloc, we were unable to =
allocate memory for a zio_write_compress call (the pool does have ZFS =
compression on) and hence this is a tuning issue and not a bug... but I =
am looking for confirmation and/or suggested changes/troubleshooting =
steps. The ZFS tuning configuration has been stable for years, to it may =
be a change in behavior or traffic... If this looks like it might be a =
bug, I will be able to get more information from a minidump if it =
reoccurs and can follow up on this thread.

Any advice or suggestions are welcome!

[jgitlin@zfs3 ~]$ zpool status
  pool: srv
 state: ONLINE
  scan: scrub repaired 0 in 2h32m with 0 errors on Tue Sep 11 20:32:18 =
2018
config:

	NAME            STATE     READ WRITE CKSUM
	srv             ONLINE       0     0     0
	  mirror-0      ONLINE       0     0     0
	    gpt/s5      ONLINE       0     0     0
	    gpt/s9      ONLINE       0     0     0
	  mirror-1      ONLINE       0     0     0
	    gpt/s6      ONLINE       0     0     0
	    gpt/s10     ONLINE       0     0     0
	  mirror-2      ONLINE       0     0     0
	    gpt/s7      ONLINE       0     0     0
	    gpt/s11     ONLINE       0     0     0
	  mirror-3      ONLINE       0     0     0
	    gpt/s8      ONLINE       0     0     0
	    gpt/s12     ONLINE       0     0     0
	logs
	  mirror-4      ONLINE       0     0     0
	    gpt/s2-zil  ONLINE       0     0     0
	    gpt/s3-zil  ONLINE       0     0     0
	cache
	  gpt/s2-cache  ONLINE       0     0     0
	  gpt/s3-cache  ONLINE       0     0     0

errors: No known data errors

ZFS tuning:

vfs.zfs.delay_min_dirty_percent=3D90
vfs.zfs.dirty_data_max=3D4294967296
vfs.zfs.dirty_data_sync=3D3221225472
vfs.zfs.prefetch_disable=3D1
vfs.zfs.top_maxinflight=3D128
vfs.zfs.trim.txg_delay=3D8
vfs.zfs.txg.timeout=3D20
vfs.zfs.vdev.aggregation_limit=3D524288
vfs.zfs.vdev.scrub_max_active=3D3
vfs.zfs.l2arc_write_boost=3D134217728
vfs.zfs.l2arc_write_max=3D134217728
vfs.zfs.l2arc_feed_min_ms=3D200
vfs.zfs.min_auto_ashift=3D12


--
 <http://www.goboomtown.com/>=09
Josh Gitlin
Senior DevOps Engineer
(415) 690-1610 x155

Stay up to date and join the conversation in Relay =
<http://relay.goboomtown.com/>.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D54225AD-CC96-45E7-A203-D2C52E984963>