From owner-freebsd-stable@FreeBSD.ORG Wed Sep 25 21:06:22 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 8EFC8E82 for ; Wed, 25 Sep 2013 21:06:22 +0000 (UTC) (envelope-from jmg@h2.funkthat.com) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 357B32434 for ; Wed, 25 Sep 2013 21:06:21 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id r8PL6F4R054262 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 25 Sep 2013 14:06:16 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id r8PL6F71054261; Wed, 25 Sep 2013 14:06:15 -0700 (PDT) (envelope-from jmg) Date: Wed, 25 Sep 2013 14:06:15 -0700 From: John-Mark Gurney To: Konstantin Belousov Subject: Re: Possible kqueue related issue on STABLE/RC. Message-ID: <20130925210615.GS14220@funkthat.com> Mail-Followup-To: Konstantin Belousov , Patrick Lamaiziere , freebsd-stable@freebsd.org References: <20130923153708.45c3be3d@mr129166> <20130923203141.GV41229@kib.kiev.ua> <20130924094427.0f4b902a@mr129166> <20130924082909.GH41229@kib.kiev.ua> <20130924114738.60c700c9@mr129166> <20130924121434.GI41229@kib.kiev.ua> <20130924174517.GB14220@funkthat.com> <20130924212127.GQ41229@kib.kiev.ua> <20130925161954.GO14220@funkthat.com> <20130925194021.GB41229@kib.kiev.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130925194021.GB41229@kib.kiev.ua> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Wed, 25 Sep 2013 14:06:16 -0700 (PDT) Cc: freebsd-stable@freebsd.org, Patrick Lamaiziere X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Sep 2013 21:06:22 -0000 Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 22:40 +0300: > On Wed, Sep 25, 2013 at 09:19:54AM -0700, John-Mark Gurney wrote: > > Konstantin Belousov wrote this message on Wed, Sep 25, 2013 at 00:21 +0300: > > > On Tue, Sep 24, 2013 at 10:45:17AM -0700, John-Mark Gurney wrote: > > > > I'd like to understand why you think protecting these functions w/ > > > > the _DETACHED check is correct... In kern_event.c, all calls to > > > > f_detach are followed by knote_drop which will ensure that the knote > > > > is removed and free, so no more f_event calls will be called on that > > > > knote.. > > > > > > My current belief is that what happens is a glitch in the > > > kqueue_register(). After a new knote is created and attached, the kq > > > lock is dropped and then f_event() is called. If the vnode is reclaimed > > > or possible freed meantime, f_event() seems to dereference freed memory, > > > since kn_hook points to freed vnode. > > > > Well, if that happens, then the vnode isn't properly clearing up the > > knote before it gets reclaimed... It is the vnode's responsibility to > > make sure any knotes that are associated w/ it get cleaned up properly.. > See below. > > > > > > The issue as I see it is that vnode lifecycle is detached from the knote > > > lifecycle. Might be, only the second patch, which acquires a hold reference > > > on the vnode for each knote, is really needed. But before going into any > > > conclusions, I want to see the testing results. > > > > The vnode lifecycle can't/shouldn't be detached from the knote lifecycle > > since the knote contains a pointer to the vnode... There is the function > > knlist_clear that can be used to clean up knotes when the object goes > > away.. > This is done from the vdropl() (null hold count) -> destroy_vpollinfo(). > But this is too late, IMO. vdropl() is only executing with the vnode > interlock locked, and knote lock is vnode lock. This way, you might > get far enough into vdropl in other thread, while trying to operate on > a vnode with zero hold count in some kqueue code path. > > We do not drain the vnode lock holders when destroying vnode, because > VFS interface require that anybody locking the vnode own a hold reference > on it. My short patch should fix exactly this issue, hopefully we will see. Which clearly wasn't happening before... With the above, and rereading your patch, I understand how this patch should fix the issue and bring the life cycle of the two back into sync... > > I was looking at the code, is there a good reason why you do > > VI_LOCK/VI_UNLOCK to protect the knote fields instead of getting it in > > the vfs_knllock/vfs_knlunlock functions? Because kq code will modify > > the knote fields w/ only running the vfs_knllock/vfs_knlunlock functions, > > so either the VI_LOCK/VI_UNLOCK are unnecessary, or should be moved to > > vfs_knllock/vfs_knlunlock... > > vfs_knllock() is fine. The problematic case if the > VOP_{PRE,POST}->VFS_KNOTE->VN_KNOTE->KNOTE calls from the VOPs. If you > look at the vfs_knl_assert_locked(), you would note that the function > only asserts that vnode is locked, not that it is locked exclusively. > This is because some filesystems started require from VFS to do e.g. > VOP_WRITE() with the vnode only shared-locked, and KNOTE() is called > with shared-locked vnode lock. > > The vfs_knllock() obtain the exclusive lock on the vnode, so kqueue > callers are fine. Taking vnode interlock inside the filters provides > enough exclusion for the VOP callers. Ahh, ok, makes sense now.. Clearly I need to learn more about the VFS/vnope system.. :) Thanks for the explanations... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."