From owner-freebsd-fs@FreeBSD.ORG Mon Apr 28 17:26:49 2008 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E1B31065672 for ; Mon, 28 Apr 2008 17:26:49 +0000 (UTC) (envelope-from daichi@freebsd.org) Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90]) by mx1.freebsd.org (Postfix) with ESMTP id 1AA318FC21 for ; Mon, 28 Apr 2008 17:26:49 +0000 (UTC) (envelope-from daichi@freebsd.org) Received: from parancell.ongs.co.jp (dullmdaler.ongs.co.jp [202.216.246.94]) by natial.ongs.co.jp (Postfix) with ESMTP id 87A0D125438; Tue, 29 Apr 2008 02:26:48 +0900 (JST) Message-ID: <481608D8.1080308@freebsd.org> Date: Tue, 29 Apr 2008 02:26:48 +0900 From: Daichi GOTO User-Agent: Thunderbird 2.0.0.12 (X11/20080423) MIME-Version: 1.0 To: Kostik Belousov References: <4811B0A0.8040702@freebsd.org> <20080426100116.GL18958@deviant.kiev.zoral.com.ua> <48159AC5.3030000@freebsd.org> <20080428132413.GS18958@deviant.kiev.zoral.com.ua> <4815E107.9030902@freebsd.org> <20080428162238.GT18958@deviant.kiev.zoral.com.ua> In-Reply-To: <20080428162238.GT18958@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: fs@freebsd.org Subject: Re: Approval request of some additions to src/sys/kern/vfs_subr.c and src/sys/sys/vnode.h X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Apr 2008 17:26:49 -0000 Kostik Belousov wrote: > On Mon, Apr 28, 2008 at 11:36:55PM +0900, Daichi GOTO wrote: >> Thanks for your response and explanation :) >> >> Kostik Belousov wrote: >>> On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote: >>>> Kostik Belousov wrote: >>>>> On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote: >>>>>> Hi Konstantin :) >>>>>> >>>>>> To fix a unionfs issue of >>>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=109377, >>>>>> we need to add new functions >>>>>> >>>>>> void vkernrele(struct vnode *vp); >>>>>> void vkernref(struct vnode *vp); >>>>>> >>>>>> and one value >>>>>> >>>>>> int v_kernusecount; /* i ref count of kernel */ >>>>>> >>>>>> to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c. >>>>>> >>>>>> Unionfs will be panic when lower fs layer is forced umounted by >>>>>> "umount -f". So to avoid this issue, we've added >>>>>> "v_kernusecount" value that means "a vnode count that kernel are >>>>>> using". vkernrele() and vkernref() are functions that manage >>>>>> "v_kernusecount" value. >>>>>> >>>>>> Please check those and give us an approve or some comments! >>>>> There is already the vnode reference count. From your description, I >>>>> cannot >>>>> understand how the kernusecount would prevent the panic when forced >>>>> unmount >>>>> is performed. Could you, please, show the actual code ? PR mentioned >>>>> does not contain any patch. >>>> Our patch realizes avoiding kernel panic by "umount -f" operation using >>>> with >>>> EBUSY process. >>>> >>>> On current implementation (not applied our patch), "umount -f" tries to >>>> release vnode at any vnode reference count value. Since that, unionfs >>>> and nullfs access invalid vnode and lead kernel panic. To prevent this >>>> issue, we need a some kind of not-umount-accept-mechanism in invalid case >>>> (e.x. fs in unionfsed stack, it must be umounted in correct order) and >>>> to realize that, current vnode reference count is not enough we are >>>> thinking. >>>> >>>> If you have any ideas to realize the same solution with current vnode >>>> reference, would you please tell us your idea :) >>>> >>>>> The problem you described is common for the kernel code, and right way >>>>> to handle it, for now, is to keep refcount _and_ check for the forced >>>>> reclaim. >>> Your patch in essence disables the forced unmount. I would object against >>> such decision. >> Oooooo.... OK. We understand. >> >>> Even if taking this direction, I believe more cleaner solution would be >>> to introduce a counter that disables the (forced) unmount into the >>> struct mount, instead of the struct vnode. Having the counter in the >>> vnode, the unmount -f behaviour is non-deterministic and depended on >>> the presence of the cached vnodes of the upper layer. The mount counter >>> would be incremented by unionfs cover mount. But, as I said above, this >>> looks like a wrong solution. >>> >>> The right way to handle the forced reclaim with the current VFS is to >>> add the explicit checks for the reclaimed vnodes where it is needed. The >>> vnode cannot be reclaimed while the vnode lock is held. When obtaining >>> the vnode lock, the reclamation can be detected. For instance, the >>> vget() without LK_RETRY shall be checked for ENOENT. >> At last, we want to check that vnode is released or not where >> unionfs does not know. If we can do that check, our patch is >> not needed for solving that issue. >> >> Would you please give us the way to check that target vnode is >> released or not before accessing it. > > The basic rules of our VFS are: > 1. You _must_ hold the vnode unless the vnode is locked. Hold count > prevents the vnode memory from being reused and guarantees the > validity of the counters, v_vnlock, v_mount and vop (but please note > that validity != stability). E.g., v_mount may be NULLed and vop > become the deadfs_vop due to reclamation. > 2. The vnode lock is held when the vnode is vgone(9)'ed. In the other > words, if you have a pointer to the non-reclaimed vnode that > is locked, the vnode cannot be reclaimed until the lock is freed. > 3. The verbs that lock a vnode (vget() and vn_lock(9)) have two mode > of operations. > - If you specify the LK_RETRY in the lock flags, you would get > even the reclaimed vnode locked. > - If you do not specified LK_RETRY, you would get ENOENT for the > reclaimed vnode. > [See the #1 for the reason why you must have a vnode held while > calling vget() or vn_lock()]. > 4. The reclaimed vnode has the VI_DOOMED flag set; you must have vnode > interlock locked to check the context of the v_iflag. Most filesystems, > as opposed to the VFS, use the other technique to detect the reclaimed > vnode, if needed. They clear the v_data in the vop_reclaim, and > verification of the (v_data != NULL) is enough to check for reclamation. > > Very good example of the practical usage of the rules above are the > nullfs routines null_reclaim(), null_lock() and null_nodeget(). Thanks for your explanation! We'll try to research and get another new solution for this issue :) >>> You said that that nullfs is vulnerable to the problem. Could you, >>> please, point me to the corresponding stack trace ? At least, the nullfs >>> vop_lock() seems to carefully check the possible problems. -- Daichi GOTO, http://people.freebsd.org/~daichi