From owner-freebsd-current@FreeBSD.ORG Mon May 20 20:42:18 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 661E52AA; Mon, 20 May 2013 20:42:18 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) by mx1.freebsd.org (Postfix) with ESMTP id 38081186A; Mon, 20 May 2013 20:42:18 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id E25ADB917; Mon, 20 May 2013 16:42:16 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Subject: Re: FreeBSD-HEAD gets stuck on vnode operations Date: Mon, 20 May 2013 14:34:55 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <5190CBEC.5000704@citrix.com> <20130514163149.GS3047@kib.kiev.ua> <51927143.4080102@citrix.com> In-Reply-To: <51927143.4080102@citrix.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Message-Id: <201305201434.55406.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 20 May 2013 16:42:17 -0400 (EDT) Cc: Konstantin Belousov , "current@freebsd.org" , Roger Pau =?iso-8859-1?q?Monn=E9?= X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 May 2013 20:42:18 -0000 On Tuesday, May 14, 2013 1:15:47 pm Roger Pau Monn=E9 wrote: > On 14/05/13 18:31, Konstantin Belousov wrote: > > On Tue, May 14, 2013 at 06:08:45PM +0200, Roger Pau Monn? wrote: > >> On 13/05/13 17:00, Konstantin Belousov wrote: > >>> On Mon, May 13, 2013 at 04:33:04PM +0200, Roger Pau Monn? wrote: > >>>> On 13/05/13 13:18, Roger Pau Monn? wrote: > >> > >> Thanks for taking a look, > >> > >>>> I would like to explain this a little bit more, the syncer process > >>>> doesn't get blocked on the _mtx_trylock_flags_ call, it just continu= es > >>>> looping forever in what seems to be an endless loop around > >>>> mnt_vnode_next_active/ffs_sync. Also while in this state there is no > >>>> noticeable disk activity, so I'm unsure of what is happening. > >>> How many CPUs does your VM have ? > >> > >> 7 vCPUs, but I've also seen this issue with 4 and 16 vCPUs. > >> > >>> > >>> The loop you describing means that other thread owns the vnode > >>> interlock. Can you track what this thread does ? E.g. look at the > >>> vp->v_interlock.mtx_lock, which is basically a pointer to the struct > >>> thread owning the mutex, clear low bits as needed. Then you can > >>> inspect the thread and get a backtrace. > >> > >> There are no other threads running, only syncer is running on CPU 1 (s= ee > >> ps in previous email). All other CPUs are idle, and as seen from the ps > >> quite a lot of threads are blocked in vnode related operations, either > >> "*Name Cac", "*vnode_fr" or "*vnode in". I've also attached the output > >> of alllocks in the previous email. > > This is not useful. You need to look at the mutex which fails the > > trylock operation in the mnt_vnode_next_active(), see who owns it, > > and then 'unwind' the locking dependencies from there. >=20 > Sorry, now I get it, let's see if I can find the locked vnodes and the > thread that owns them... You can use 'show lock
v_interlock>' to find an owning thread and then use 'show sleepchain '. If you are using kgdb on t= he=20 live system (probably easier) then you can grab my scripts at=20 www.freebsd.org/~jhb/gdb/ (do 'cd /path/to/scripts; source gdb6'). You can= =20 then find the offending thread and do 'mtx_owner &vp->v_interlock' and then 'sleepchain ' =2D-=20 John Baldwin