Date: Mon, 04 Mar 2019 14:46:30 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 236220] ZFS vnode deadlock Message-ID: <bug-236220-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D236220 Bug ID: 236220 Summary: ZFS vnode deadlock Product: Base System Version: 12.0-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: ncrogers@gmail.com Created attachment 202551 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D202551&action= =3Dedit procstat + gdb Recently a number of my production 12.0 systems have experienced what I can only gather is a ZFS deadlock related to vnodes. It seems similar to the relatively recent FreeBSD-EN-18:18.zfs (ZFS vnode reclaim deadlock) problem. Previously the same systems were running 11.1-RELEASE without problem. Threads are always stuck with the stack around vn_lock->zfs_root->lookup->namei. When the system is in this state, a simple `ls /` or `ls /tmp` always hangs, but other datasets seem unaffected. I hav= e a fairly straightforward ZFS root setup on a single pool with one SSD. The workload is a ruby/rails/nginx/postgresql backed web application combined w= ith some data warehousing and other periodic tasks. Sometimes I can remote SSH in, other times that fails because the user shell fails to load, and I can run commands via `ssh ... command`. Sometimes the system is not accessible remotely at all, or it eventually becomes inaccess= ible if left long enough in this state. I always have to physically reboot the device because the shutdown procedure fails. The network stack (e.g. ping) seems to work completely fine whilst this is going on, until you try to interact with or spawn a process that hits the deadlock. Like previous similar ZFS deadlock issues, increasing kern.vnodes seems to = make the system last longer by up to a few weeks, but is still a bandaid. Howeve= r, I have yet to witness vnodes usage actually getting close to the maximum. I haven't had any luck reproducing this reliably, but eventually it happens after a few days or a few weeks... I managed to connect to a system in this state and grab a procstat and get (hopefully) something useful out of kgdb.= I will note that although I was able to install debug symbols, I couldn't man= age to get the source files onto it for kgdb purposes before I lost SSH access. Attached is an abbreviated procstat and what I was able to get out of kgdb = for an affected thread. Note that the thread backtrace is from a simple `ls` command. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-236220-227>