From owner-freebsd-fs@FreeBSD.ORG Fri Aug 18 20:20:09 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0D9EE16A4DA; Fri, 18 Aug 2006 20:20:08 +0000 (UTC) (envelope-from Tor.Egge@cvsup.no.freebsd.org) Received: from pil.idi.ntnu.no (pil.idi.ntnu.no [129.241.107.93]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1B7E843D4C; Fri, 18 Aug 2006 20:20:05 +0000 (GMT) (envelope-from Tor.Egge@cvsup.no.freebsd.org) Received: from cvsup.no.freebsd.org (c2h5oh.idi.ntnu.no [129.241.103.69]) by pil.idi.ntnu.no (8.13.6/8.13.1) with ESMTP id k7IKK3lX001495 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 18 Aug 2006 22:20:03 +0200 (MEST) Received: from localhost (localhost [127.0.0.1]) by cvsup.no.freebsd.org (8.13.4/8.13.4) with ESMTP id k7IKK2V6013041; Fri, 18 Aug 2006 20:20:02 GMT (envelope-from Tor.Egge@cvsup.no.freebsd.org) Date: Fri, 18 Aug 2006 20:20:01 +0000 (UTC) Message-Id: <20060818.202001.74745664.Tor.Egge@cvsup.no.freebsd.org> To: kostikbel@gmail.com From: Tor Egge In-Reply-To: <20060818164903.GF20768@deviant.kiev.zoral.com.ua> References: <20060817113203.GD1483@deviant.kiev.zoral.com.ua> <20060817170314.GA17490@peter.osted.lan> <20060818164903.GF20768@deviant.kiev.zoral.com.ua> X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Scanned-By: mimedefang.idi.ntnu.no, using CLAMD X-SMTP-From: Sender=, Relay/Client=c2h5oh.idi.ntnu.no [129.241.103.69], EHLO=cvsup.no.freebsd.org X-Scanned-By: MIMEDefang 2.48 on 129.241.107.38 X-Scanned-By: mimedefang.idi.ntnu.no, using MIMEDefang 2.48 with local filter 16.42-idi X-Filter-Time: 1 seconds Cc: freebsd-fs@freebsd.org, tegge@freebsd.org Subject: Re: Deadlock between nfsd and snapshots. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Aug 2006 20:20:09 -0000 > First, big thanks to Peter for helping debugging the problem ! > > This deadlock happens between processes 764 (nfsd) and 62981 (mksnap_ffs). > In fact, deadlock is not specific to nfsd. It happens when ufs_inactive() > interposes with ffs_snapshot. [snip] > On the other hand, ufs_inactive calls vn_start_secondary_write(vp, XXX, > V_WAIT). ufs_inactive is running with vnode locked, If happens at the right > time, system will deadlock. > > nfsd is the most vulnerable to the problem due to it oftenly being the > only (and last) user of vnode, vput() from nfsd have high chance resulting > in vinactive(). > > Below is the patch that set VI_OWEINACT for the inode if the last call to > vn_start_sec_write(..., V_NOWAIT) fails. The return from that point is safe > because mp == NULL means that no previous code that changes inode was > executed. > Please, review and test. The deadlock indicates that one or more of IN_CHANGE, IN_MODIFIED or IN_UPDATE was set on the inode, indicating a write operation (e.g. VOP_WRITE(), VOP_RENAME(), VOP_CREATE(), VOP_REMOVE(), VOP_LINK(), VOP_SYMLINK(), VOP_SETATTR(), VOP_MKDIR(), VOP_RMDIR(), VOP_MKNOD()) that was not protected by vn_start_write() or vn_start_secondary_write(). The suspension of the file system should have cleared those flags on all related inodes. Write operations protected by vn_start_write() should have blocked without holding any vnode lock until the file system was resumed while write operations protected by vn_start_secondary_write() should have triggered a retry of the vnode sync loop in ffs_sync(). Such unprotected write operations might render the snapshot inconsistent. Your patch addresses the deadlock symptom but not the cause. - Tor Egge