From owner-freebsd-fs@FreeBSD.ORG  Mon Apr 28 17:26:49 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4E1B31065672
	for <fs@freebsd.org>; Mon, 28 Apr 2008 17:26:49 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90])
	by mx1.freebsd.org (Postfix) with ESMTP id 1AA318FC21
	for <fs@freebsd.org>; Mon, 28 Apr 2008 17:26:49 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from parancell.ongs.co.jp (dullmdaler.ongs.co.jp [202.216.246.94])
	by natial.ongs.co.jp (Postfix) with ESMTP id 87A0D125438;
	Tue, 29 Apr 2008 02:26:48 +0900 (JST)
Message-ID: <481608D8.1080308@freebsd.org>
Date: Tue, 29 Apr 2008 02:26:48 +0900
From: Daichi GOTO <daichi@freebsd.org>
User-Agent: Thunderbird 2.0.0.12 (X11/20080423)
MIME-Version: 1.0
To: Kostik Belousov <kostikbel@gmail.com>
References: <4811B0A0.8040702@freebsd.org>
	<20080426100116.GL18958@deviant.kiev.zoral.com.ua>
	<48159AC5.3030000@freebsd.org>
	<20080428132413.GS18958@deviant.kiev.zoral.com.ua>
	<4815E107.9030902@freebsd.org>
	<20080428162238.GT18958@deviant.kiev.zoral.com.ua>
In-Reply-To: <20080428162238.GT18958@deviant.kiev.zoral.com.ua>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: fs@freebsd.org
Subject: Re: Approval request of some additions to src/sys/kern/vfs_subr.c
 and src/sys/sys/vnode.h
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Apr 2008 17:26:49 -0000

Kostik Belousov wrote:
> On Mon, Apr 28, 2008 at 11:36:55PM +0900, Daichi GOTO wrote:
>> Thanks for your response and explanation :)
>>
>> Kostik Belousov wrote:
>>> On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote:
>>>> Kostik Belousov wrote:
>>>>> On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote:
>>>>>> Hi Konstantin :)
>>>>>>
>>>>>> To fix a unionfs issue of 
>>>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=109377,
>>>>>> we need to add new functions
>>>>>>
>>>>>>  void vkernrele(struct vnode *vp);
>>>>>>  void vkernref(struct vnode *vp);
>>>>>>
>>>>>> and one value
>>>>>>
>>>>>>  int	v_kernusecount; /* i ref count of kernel */
>>>>>>
>>>>>> to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c.
>>>>>>
>>>>>> Unionfs will be panic when lower fs layer is forced umounted by
>>>>>> "umount -f".  So to avoid this issue, we've added
>>>>>> "v_kernusecount" value that means "a vnode count that kernel are
>>>>>> using".  vkernrele() and vkernref() are functions that manage
>>>>>> "v_kernusecount" value.
>>>>>>
>>>>>> Please check those and give us an approve or some comments!
>>>>> There is already the vnode reference count. From your description, I 
>>>>> cannot
>>>>> understand how the kernusecount would prevent the panic when forced 
>>>>> unmount
>>>>> is performed. Could you, please, show the actual code ? PR mentioned
>>>>> does not contain any patch.
>>>> Our patch realizes avoiding kernel panic by "umount -f" operation using 
>>>> with
>>>> EBUSY process.
>>>>
>>>> On current implementation (not applied our patch), "umount -f" tries to
>>>> release vnode at any vnode reference count value. Since that, unionfs
>>>> and nullfs access invalid vnode and lead kernel panic. To prevent this
>>>> issue, we need a some kind of not-umount-accept-mechanism in invalid case
>>>> (e.x. fs in unionfsed stack, it must be umounted in correct order) and
>>>> to realize that, current vnode reference count is not enough we are 
>>>> thinking.
>>>>
>>>> If you have any ideas to realize the same solution with current vnode
>>>> reference, would you please tell us your idea :)
>>>>
>>>>> The problem you described is common for the kernel code, and right way
>>>>> to handle it, for now, is to keep refcount _and_ check for the forced
>>>>> reclaim.
>>> Your patch in essence disables the forced unmount. I would object against
>>> such decision.
>> Oooooo....   OK. We understand.
>>
>>> Even if taking this direction, I believe more cleaner solution would be
>>> to introduce a counter that disables the (forced) unmount into the
>>> struct mount, instead of the struct vnode. Having the counter in the
>>> vnode, the unmount -f behaviour is non-deterministic and depended on
>>> the presence of the cached vnodes of the upper layer. The mount counter
>>> would be incremented by unionfs cover mount. But, as I said above, this
>>> looks like a wrong solution.
>>>
>>> The right way to handle the forced reclaim with the current VFS is to
>>> add the explicit checks for the reclaimed vnodes where it is needed. The
>>> vnode cannot be reclaimed while the vnode lock is held. When obtaining
>>> the vnode lock, the reclamation can be detected. For instance, the
>>> vget() without LK_RETRY shall be checked for ENOENT.
>> At last, we want to check that vnode is released or not where
>> unionfs does not know. If we can do that check, our patch is
>> not needed for solving that issue.
>>
>> Would you please give us the way to check that target vnode is
>> released or not before accessing it.
> 
> The basic rules of our VFS are:
> 1. You _must_ hold the vnode unless the vnode is locked. Hold count
>    prevents the vnode memory from being reused and guarantees the
>    validity of the counters, v_vnlock, v_mount and vop (but please note
>    that validity != stability). E.g., v_mount may be NULLed and vop
>    become the deadfs_vop due to reclamation.
> 2. The vnode lock is held when the vnode is vgone(9)'ed. In the other
>    words, if you have a pointer to the non-reclaimed vnode that
>    is locked, the vnode cannot be reclaimed until the lock is freed.
> 3. The verbs that lock a vnode (vget() and vn_lock(9)) have two mode
>    of operations.
>    - If you specify the LK_RETRY in the lock flags, you would get
>      even the reclaimed vnode locked.
>    - If you do not specified LK_RETRY, you would get ENOENT for the
>      reclaimed vnode.
>    [See the #1 for the reason why you must have a vnode held while
>     calling vget() or vn_lock()].
> 4. The reclaimed vnode has the VI_DOOMED flag set; you must have vnode
>    interlock locked to check the context of the v_iflag. Most filesystems,
>    as opposed to the VFS, use the other technique to detect the reclaimed
>    vnode, if needed. They clear the v_data in the vop_reclaim, and
>    verification of the (v_data != NULL) is enough to check for reclamation.
> 
> Very good example of the practical usage of the rules above are the
> nullfs routines null_reclaim(), null_lock() and null_nodeget().

Thanks for your explanation!  We'll try to research and get another
new solution for this issue :)

>>> You said that that nullfs is vulnerable to the problem. Could you,
>>> please, point me to the corresponding stack trace ? At least, the nullfs
>>> vop_lock() seems to carefully check the possible problems.

-- 
   Daichi GOTO, http://people.freebsd.org/~daichi