From owner-freebsd-current@FreeBSD.ORG Wed May 4 09:05:06 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1EF281065670 for ; Wed, 4 May 2011 09:05:06 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id C3D608FC16 for ; Wed, 4 May 2011 09:05:05 +0000 (UTC) Received: by qwc9 with SMTP id 9so698055qwc.13 for ; Wed, 04 May 2011 02:05:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=zMM8Le7d1Q+/2g9E++/ixrzz9VpuDur+hDgtsOPf9Fg=; b=F2p5N5GOJVaXf1ApyGC0jOIE5NdANQgro58AYvS4/HF2lTvvvjOAT7dbLMlbSWIG+h E/XuIjviHWBim5XVSThCafD+RyN2KuvhZrXAY0nFc1O3N6Sk8VoqDqHDA2gZ0DNyhBMR vKV/CDhKx1hZO/WBsnNF587GL6WFre+JCAius= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=EC11ctF9Idfp0uRhfWwXOKdTF/BN0WE5v+61Zjh0B9hYYZdEY8gE1XUt39JMwdU0u5 f0i3bwAX1tM2ZDhbznF+BN7S+598YfQvM2Ajc1+3dHoHXFhxBfbsejvAVSm0kQxjO05t E0XMbbj4Xtq1HngpuDK6LlOjZc+AuOS/3ZVVk= MIME-Version: 1.0 Received: by 10.229.112.21 with SMTP id u21mr545507qcp.62.1304499905150; Wed, 04 May 2011 02:05:05 -0700 (PDT) Received: by 10.229.97.146 with HTTP; Wed, 4 May 2011 02:05:05 -0700 (PDT) In-Reply-To: References: <201105040559.p445xEJ5024585@chez.mckusick.com> Date: Wed, 4 May 2011 13:05:05 +0400 Message-ID: From: Sergey Kandaurov To: Garrett Cooper Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Kirk McKusick , FreeBSD Current Subject: Re: Nasty non-recursive lockmgr panic on softdep only enabled UFS partition when filesystem full X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 09:05:06 -0000 On 4 May 2011 10:42, Garrett Cooper wrote: > On Tue, May 3, 2011 at 10:59 PM, Kirk McKusick wr= ote: >>> Date: Tue, 3 May 2011 22:40:26 -0700 >>> Subject: Nasty non-recursive lockmgr panic on softdep only enabled UFS >>> =A0partition when filesystem full >>> From: Garrett Cooper >>> To: Jeff Roberson , >>> =A0 =A0 =A0 =A0 Marshall Kirk McKusick >>> Cc: FreeBSD Current >>> >>> Hi Jeff and Dr. McKusick, >>> =A0 =A0 Ran into this panic when /usr ran out of space doing a make >>> universe on amd64/r221219 (it took ~15 minutes for the panic to occur >>> after the filesystem ran out of space -- wasn't quite sure what it was >>> doing at the time): >>> >>> ... >>> >>> =A0 =A0 Let me know what other commands you would like for me to run in= kgdb. >>> Thanks, >>> -Garrett >> >> You did not indicate whether you are running an 8.X system or a 9-curren= t >> system. It would be helpful to know that. > > I've actually been running CURRENT for a few years now, but you're right = -- > I didn't mention that part. > >> Jeff thinks that there may be a potential race in the locking code for >> softdep_request_cleanup. If so, this patch for 9-current should fix it: >> >> Index: ffs_softdep.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- ffs_softdep.c =A0 =A0 =A0 (revision 221385) >> +++ ffs_softdep.c =A0 =A0 =A0 (working copy) >> @@ -11380,7 +11380,8 @@ >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue; >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0MNT_IUNLOCK(mp); >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (vget(lvp, LK_EXCLUSIVE= | LK_INTERLOCK, curthread)) { >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (vget(lvp, LK_EXCLUSIVE= | LK_NOWAIT | LK_INTERLOCK, >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 curthread)) { >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0MNT_ILOCK= (mp); >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue; >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} >> FYI, I was playing with head (w/o the above patch) to reproduce the panic and go= t this LOR when filesystem was eventually filled. I'm not sure the patch would fix the panic but I think it should at least fix the LOR. kernel: pid 66153 (dd), uid 0 inumber 4 on /mnt: filesystem full lock order reversal: 1st 0xfffffe001d7d3310 ufs (ufs) @ /usr/src/sys/kern/vfs_vnops.c:614 2nd 0xffffff807ba8a800 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:265= 8 3rd 0xfffffe001ade7588 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2126 KDB: stack backtrace: db_trace_self_wrapper() at 0xffffffff802d9eba =3D db_trace_self_wrapper+0x2= a kdb_backtrace() at 0xffffffff80475d17 =3D kdb_backtrace+0x37 _witness_debugger() at 0xffffffff8048b4fe =3D _witness_debugger+0x2e witness_checkorder() at 0xffffffff8048c7a7 =3D witness_checkorder+0x807 __lockmgr_args() at 0xffffffff80427553 =3D __lockmgr_args+0xd63 ffs_lock() at 0xffffffff806578fc =3D ffs_lock+0x9c VOP_LOCK1_APV() at 0xffffffff806f285f =3D VOP_LOCK1_APV+0xbf _vn_lock() at 0xffffffff804e87c7 =3D _vn_lock+0x57 vget() at 0xffffffff804dbb5b =3D vget+0x7b softdep_request_cleanup() at 0xffffffff80649f31 =3D softdep_request_cleanup= +0x311 ffs_alloc() at 0xffffffff80630b64 =3D ffs_alloc+0x134 ffs_balloc_ufs2() at 0xffffffff8063426c =3D ffs_balloc_ufs2+0x11ac ffs_write() at 0xffffffff8065889f =3D ffs_write+0x22f VOP_WRITE_APV() at 0xffffffff806f33dd =3D VOP_WRITE_APV+0x14d vn_write() at 0xffffffff804e9a42 =3D vn_write+0x2a2 dofilewrite() at 0xffffffff8048df25 =3D dofilewrite+0x85 kern_writev() at 0xffffffff8048f740 =3D kern_writev+0x60 write() at 0xffffffff8048f845 =3D write+0x55 syscallenter() at 0xffffffff80483cbb =3D syscallenter+0x1cb syscall() at 0xffffffff806abaf0 =3D syscall+0x60 Xfast_syscall() at 0xffffffff8069670d =3D Xfast_syscall+0xdd --- syscall (4, FreeBSD ELF64, write), rip =3D 0x8009438fc, rsp =3D 0x7fffffffda68, rbp =3D 0xa00000 --- --=20 wbr, pluknet