From owner-freebsd-current@FreeBSD.ORG Sun Jul 6 20:55:17 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 518F35B8 for ; Sun, 6 Jul 2014 20:55:17 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 31EDE24B0 for ; Sun, 6 Jul 2014 20:55:16 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-76-21-10-192.hsd1.ca.comcast.net [76.21.10.192]) by elvis.mu.org (Postfix) with ESMTPSA id C7AB11A3C19 for ; Sun, 6 Jul 2014 13:55:10 -0700 (PDT) Message-ID: <53B9B7AB.4020909@mu.org> Date: Sun, 06 Jul 2014 13:55:07 -0700 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: freebsd-current@freebsd.org Subject: Re: tmpfs panic References: <20140706135333.GA80856@mouf.net> <20140706154621.GA81830@mouf.net> <20140706172511.GA84461@mouf.net> <20140706181226.GE93733@kib.kiev.ua> In-Reply-To: <20140706181226.GE93733@kib.kiev.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jul 2014 20:55:17 -0000 On 7/6/14 11:12 AM, Konstantin Belousov wrote: > On Sun, Jul 06, 2014 at 05:25:12PM +0000, Steve Wills wrote: >> On Sun, Jul 06, 2014 at 12:28:07PM -0400, Ryan Stone wrote: >>> On Sun, Jul 6, 2014 at 11:46 AM, Steve Wills wrote: >>>> I should have noted this system is running in bhyve. Also I'm told this panic >>>> may be related to the fact that the system is running in bhyve. >>>> >>>> Looking at it a little more closely: >>>> >>>> (kgdb) list *__mtx_lock_sleep+0xb1 >>>> 0xffffffff809638d1 is in __mtx_lock_sleep (/usr/src/sys/kern/kern_mutex.c:431). >>>> 426 * owner stops running or the state of the lock changes. >>>> 427 */ >>>> 428 v = m->mtx_lock; >>>> 429 if (v != MTX_UNOWNED) { >>>> 430 owner = (struct thread *)(v & ~MTX_FLAGMASK); >>>> 431 if (TD_IS_RUNNING(owner)) { >>>> 432 if (LOCK_LOG_TEST(&m->lock_object, 0)) >>>> 433 CTR3(KTR_LOCK, >>>> 434 "%s: spinning on %p held by %p", >>>> 435 __func__, m, owner); >>>> (kgdb) >>>> >>>> I'm told that MTX_CONTESTED was set on the unlocked mtx and that MTX_CONTENDED >>>> is spuriously left behind, and to ask how lock prefix is handled in bhyve. Any >>>> of that make sense to anyone? >>> The mutex has both MTX_CONTESTED and MTX_UNOWNED set on it? That is a >>> special sentinel value that is set on a mutex when it is destroyed >>> (see MTX_DESTROYED in sys/mutex.h). If that is the case it looks like >>> you've stumbled upon some kind of use-after-free in tmpfs. I doubt >>> that bhyve is responsible (other than perhaps changing the timing >>> around making the panic more likely to happen). >> Given the first thing seen was: >> >> Freed UMA keg (TMPFS node) was not empty (16 items). Lost 1 pages of memory. >> >> this sounds reasonable to me. >> >> What can I do to help find and elliminate the source of the error? > The most worrying fact there is that the mutex which is creating trouble > cannot be anything other but allnode_lock, from the backtrace. For this > mutex to be destroyed, the unmount of the corresponding mount point must > run to completion. > > In particular, it must get past the vflush(9) call in tmpfs_unmount(). > This call reclaims all vnodes belonging to the unmounted mount point. > New vnodes cannot be instantiated meantime, since insmntque(9) is > blocked by the MNTK_UNMOUNT flag. > > That said, the backtrace indicates that we have live vnode, which is > reclaimed, and also we have the mutex which is in the destroyed (?) > state. My basic claim is that the two events cannot co-exist, at least, > this code path was heavily exercised and most issues were fixed during > several years. > > I cannot exclude the possibility of tmpfs/VFS screwing things up, > but given the above reasoning, and the fact that this is the first > appearance of the MTX_DESTROED problem for the tmpfs unmounting code, > which was not changed for long time, I would at least ask some things > about bhyve. I.e., I would rather first look at the locked prefix > emulation then at the tmpfs. What about running the code with INVARIANTS + DEBUG_VFS_LOCKS and see if anything shakes out? -Alfred -- Alfred Perlstein