From owner-freebsd-bugs@freebsd.org Mon Mar 4 14:46:33 2019 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 18A3D1513C68 for ; Mon, 4 Mar 2019 14:46:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 975FC6D7FE for ; Mon, 4 Mar 2019 14:46:32 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 5139A1513C67; Mon, 4 Mar 2019 14:46:32 +0000 (UTC) Delivered-To: bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2ED971513C65 for ; Mon, 4 Mar 2019 14:46:32 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BB8776D7FD for ; Mon, 4 Mar 2019 14:46:31 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id C8AD1DC7A for ; Mon, 4 Mar 2019 14:46:30 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x24EkU9A046878 for ; Mon, 4 Mar 2019 14:46:30 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x24EkUdZ046877 for bugs@FreeBSD.org; Mon, 4 Mar 2019 14:46:30 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 236220] ZFS vnode deadlock Date: Mon, 04 Mar 2019 14:46:30 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: ncrogers@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 14:46:33 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D236220 Bug ID: 236220 Summary: ZFS vnode deadlock Product: Base System Version: 12.0-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: ncrogers@gmail.com Created attachment 202551 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D202551&action= =3Dedit procstat + gdb Recently a number of my production 12.0 systems have experienced what I can only gather is a ZFS deadlock related to vnodes. It seems similar to the relatively recent FreeBSD-EN-18:18.zfs (ZFS vnode reclaim deadlock) problem. Previously the same systems were running 11.1-RELEASE without problem. Threads are always stuck with the stack around vn_lock->zfs_root->lookup->namei. When the system is in this state, a simple `ls /` or `ls /tmp` always hangs, but other datasets seem unaffected. I hav= e a fairly straightforward ZFS root setup on a single pool with one SSD. The workload is a ruby/rails/nginx/postgresql backed web application combined w= ith some data warehousing and other periodic tasks. Sometimes I can remote SSH in, other times that fails because the user shell fails to load, and I can run commands via `ssh ... command`. Sometimes the system is not accessible remotely at all, or it eventually becomes inaccess= ible if left long enough in this state. I always have to physically reboot the device because the shutdown procedure fails. The network stack (e.g. ping) seems to work completely fine whilst this is going on, until you try to interact with or spawn a process that hits the deadlock. Like previous similar ZFS deadlock issues, increasing kern.vnodes seems to = make the system last longer by up to a few weeks, but is still a bandaid. Howeve= r, I have yet to witness vnodes usage actually getting close to the maximum. I haven't had any luck reproducing this reliably, but eventually it happens after a few days or a few weeks... I managed to connect to a system in this state and grab a procstat and get (hopefully) something useful out of kgdb.= I will note that although I was able to install debug symbols, I couldn't man= age to get the source files onto it for kgdb purposes before I lost SSH access. Attached is an abbreviated procstat and what I was able to get out of kgdb = for an affected thread. Note that the thread backtrace is from a simple `ls` command. --=20 You are receiving this mail because: You are the assignee for the bug.=