Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 May 2010 10:44:34 -1000 (HST)
From:      Jeff Roberson <jroberson@jroberson.net>
To:        =?ISO-8859-15?Q?Ulrich_Sp=F6rlein?= <uqs@spoerlein.net>
Cc:        Attilio Rao <attilio@freebsd.org>, current@freebsd.org, Peter Jeremy <peterjeremy@acm.org>
Subject:   Re: LOR: ufs vs bufwait
Message-ID:  <alpine.BSF.2.00.1005121040390.1398@desktop>
In-Reply-To: <20100512141154.GF88504@acme.spoerlein.net>
References:  <20100508102005.GB1867@elmar.spoerlein.net> <20100510061057.GA93038@server.vk2pj.dyndns.org> <u2h3bbf2fe11005101353k493f3ca3v7c1216e840820c67@mail.gmail.com> <20100512141154.GF88504@acme.spoerlein.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 12 May 2010, Ulrich Sp?rlein wrote:

> On Mon, 10.05.2010 at 22:53:32 +0200, Attilio Rao wrote:
>> 2010/5/10 Peter Jeremy <peterjeremy@acm.org>:
>>> On 2010-May-08 12:20:05 +0200, Ulrich Sp?rlein <uqs@spoerlein.net> wrote:
>>>> This LOR also is not yet listed on the LOR page, so I guess it's rather
>>>> new. I do use SUJ.
>>>>
>>>> lock order reversal:
>>>> 1st 0xc48388d8 ufs (ufs) @ /usr/src/sys/kern/vfs_lookup.c:502
>>>> 2nd 0xec0fe304 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_softdep.c:11363
>>>> 3rd 0xc49e56b8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2091
>>>
>>> I'm seeing exactly the same LOR (and subsequent deadlock) on a recent
>>> -current without SUJ.
>>
>> I think this LOR was reported since a long time.
>> The deadlock may be new and someway related to the vm_page_lock work
>> (if not SUJ).
>
> I was not able to reproduce this with a kernel prior to SUJ, a kernel
> just after SUJ went it shows this "deadlock" or infinite loop ...
>
> Now it might be that the SUJ kernel only increases the pressure so it
> happens during a systems uptime. It does not seem directly related to
> actually using SUJ on a volume, as I could reproduce it with SU only,
> too.
>
> I will try to get a hang not involving GELI and also re-do my tests when
> the volumes have neither SUJ nor SU enabled, which led to 10-20s "hangs"
> of the system IIRC. It seems SU/SUJ then only prolongs these hangs ad
> infinitum.

I think Peter Holm also saw this once while we were testing SUJ and 
reproduced ~30 second hangs with stock sources.  At this point we need to 
brainstorm ideas for adding debugging instrumentation and come up with the 
quickest possible repro.

It would probably be good to add some KTR tracing and log that when it 
wedges.  The core I looked at was hung in bufwait.  Is there any cpu 
activity or io activity when things hang?  You'll prboably have to keep 
iostat/vmstat in memory to find out so they don't try to fault in pages 
once things are hung.

Thanks,
Jeff

>
> I'll be back next week with new results here
>
> Uli
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1005121040390.1398>