Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Mar 2012 13:39:18 -0500
From:      Arnaud Lacombe <lacombar@gmail.com>
To:        Attilio Rao <attilio@freebsd.org>
Cc:        freebsd-stable <freebsd-stable@freebsd.org>
Subject:   Re: Complete hang on 9.0-RELEASE
Message-ID:  <CACqU3MXCf4U9OHShR1BwvPOsiP9=5=A4oTZR02qC8weKhU6p6g@mail.gmail.com>
In-Reply-To: <CACqU3MVBzkg%2BjzkBNjcXRqTfxZEX0rXs3HwjQN=hLSfSZWGn7g@mail.gmail.com>
References:  <CACqU3MUefo4mG3GdZnj6kxxFx4H_M3-NLys8pCKptqNU4r_ywA@mail.gmail.com> <CACqU3MVs1mpiQpjE9xC8aFAKxhzbjUgC_6GKWdAkyr8OGJhycw@mail.gmail.com> <CAJ-FndDz6eRamnf7v6kZwwZQp-JaLYUKX6Gx7MYuZGEFNagmfQ@mail.gmail.com> <CACqU3MV-mDHzmnXY3Mzc%2BBnimJSnUTAPk66fh%2Bzzdfgz4OyPFg@mail.gmail.com> <CACqU3MVL14TxJ81rbM-Oq2P8GZCE0hPKzQpb5eJqZ32YdowSjQ@mail.gmail.com> <CACqU3MVko6jKjs98JeS1NqBp%2BFR0YtMqPq570J3dN7BPyFvdkA@mail.gmail.com> <CAJ-FndCtA6F=XzTbYsDD1y3-aXuOSMDobtSuOgdNVwaNU4kY_A@mail.gmail.com> <CACqU3MWdJAp2XqWESUTtvX7CeESm=WEcqcRa0T105Kx95jMXzA@mail.gmail.com> <CAJ-FndBW2=78cEWfvYFDjZ3z_VOs-Gj836eo7pgwmy0UmuaCeA@mail.gmail.com> <CACqU3MVBzkg%2BjzkBNjcXRqTfxZEX0rXs3HwjQN=hLSfSZWGn7g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

On Wed, Feb 29, 2012 at 2:31 PM, Arnaud Lacombe <lacombar@gmail.com> wrote:
> Hi,
>
> On Wed, Feb 29, 2012 at 2:22 PM, Attilio Rao <attilio@freebsd.org> wrote:
>> 2012/2/29, Arnaud Lacombe <lacombar@gmail.com>:
>>> Hi,
>>>
>>> On Wed, Feb 29, 2012 at 1:44 PM, Attilio Rao <attilio@freebsd.org> wrot=
e:
>>>> 2012/2/29, Arnaud Lacombe <lacombar@gmail.com>:
>>>>> Hi,
>>>>>
>>>>> On Wed, Feb 29, 2012 at 12:59 PM, Arnaud Lacombe <lacombar@gmail.com>
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Mon, Feb 27, 2012 at 12:48 PM, Arnaud Lacombe <lacombar@gmail.com=
>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Mon, Feb 27, 2012 at 10:36 AM, Attilio Rao <attilio@freebsd.org>
>>>>>>> wrote:
>>>>>>>> 2012/2/27, Arnaud Lacombe <lacombar@gmail.com>:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On Tue, Feb 14, 2012 at 11:41 AM, Arnaud Lacombe <lacombar@gmail.=
com>
>>>>>>>>> wrote:
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>> For the records, I was running some tests yesterday on top of a
>>>>>>>>>> 9.0-RELEASE, amd64, kernel when the box hanged. At the time of t=
he
>>>>>>>>>> hang, the box was running a process with about 2800 threads with
>>>>>>>>>> heavy
>>>>>>>>>> IPC between 1400 writers and 1400 readers. The box was in single
>>>>>>>>>> user
>>>>>>>>>> mode (/bin/sh coming from FreeBSD 7.4-STABLE). Here is the begin=
ning
>>>>>>>>>> of the dmesg:
>>>>>>>>>>
>>>>>>>>> This happened a second time, now with FreeBSD 8.2-RELEASE. Comple=
te
>>>>>>>>> machine hang. The machine was running about 4000 threads in a sin=
gle
>>>>>>>>> process, all the other condition are the same.
>>>>>>>>
>>>>>>>> Arnaud,
>>>>>>>> can you please break in your kernel via KDB, collect the following
>>>>>>>> informations from the DDB prompt:
>>>>>>>> - ps
>>>>>>>> - alltrace
>>>>>>>> - show allpcpu
>>>>>>>> - possibly get a coredump with 'call doadump'
>>>>>>>>
>>>>>>> Will do, but I'll need to rebuild a kernel to include DDB.
>>>>>>>
>>>>>>>> and in the end provide all those along with kernel binary and poss=
ibly
>>>>>>>> sources somewhere?
>>>>>>>>
>>>>>>> I'll be testing a bare `release/8.2.0' with the following patch:
>>>>>>>
>>>>>>> diff --git a/sys/amd64/conf/GENERIC b/sys/amd64/conf/GENERIC
>>>>>>> index c3e0095..7bd997f 100644
>>>>>>> --- a/sys/amd64/conf/GENERIC
>>>>>>> +++ b/sys/amd64/conf/GENERIC
>>>>>>> @@ -79,6 +79,10 @@ options =A0 =A0 =A0INCLUDE_CONFIG_FILE =A0 =A0 #=
 Include this
>>>>>>> file in kernel
>>>>>>>
>>>>>>> =A0options =A0 =A0 =A0 =A0KDB =A0 =A0 =A0 =A0 =A0 # Kernel debugger=
 related code
>>>>>>> =A0options =A0 =A0 =A0 =A0KDB_TRACE =A0 =A0 # Print a stack trace f=
or a panic
>>>>>>> +options =A0 =A0 =A0 =A0DDB
>>>>>>> +options =A0 =A0 =A0 =A0BREAK_TO_DEBUGGER
>>>>>>> +options =A0 =A0 =A0 =A0ALT_BREAK_TO_DEBUGGER
>>>>>>>
>>>>>>> =A0# Make an SMP-capable kernel by default
>>>>>>> =A0options =A0 =A0 =A0 =A0SMP =A0 =A0 =A0 =A0 =A0 # Symmetric Multi=
Processor Kernel
>>>>>>>
>>>>>> ok, it happened again after 2 days, the process was running about 32=
00
>>>>>> threads. I'm trying to break into DDB and let you know, I'm not that
>>>>>> successful for now...
>>>>>>
>>>>> No luck. None of BREAK or ALT_BREAK are responding. I will not touch
>>>>> the system in the next few hours if you want me to test something on
>>>>> it. In the event of 8.2-RELEASE or 9.0-RELEASE are =A0not meant to wo=
rk
>>>>> reliably on top of a 7.4-RELEASE userland, I will re-setup the test t=
o
>>>>> occurs on a clean 9.0-RELEASE system and re-try.
>>>>
>>>> We allow to break KBI when new releases happens, thus this may cause a
>>>> breakage for you, even if a deadlock is really not something you want.
>>>>
>>>> Can you try enabling SW_WATCHDOG, DEADLKRES and possibly arm your ichw=
d?
>>>> if the breakage involves clocks or interrupt sources there are still
>>>> chances they will be able to catch it though.
>>>>
>>>> However, it doesn't seem you are setup with a proper serial console?
>>> The serial console is working definitively fine. I can break into DDB
>>> at will when the test is running. I did not test with ALT_BREAK
>>> per-se, but BREAK does work.
>>
>> So if you try to break in DDB via serial break it doesn't work?
>> That is definitively very bad...
>>
> just to be sure, I rebooted the system and I could break into DDB at
> the first attempt with ALT_BREAK, BREAK was a bit more reluctant but
> worked too. So yes, this does not taste good :/
>
>> Can you try with the options I mentioned earlier and see if something ch=
anges?
>>
> will do, but I will first attempt to reproduce this on 9.0-RELEASE.
>
9.0-RELEASE (kernel + userland) hanged today while running 2000
threads. Next step is to reproduce it with a watchdog+textdump enabled
kernel.

 - Arnaud



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACqU3MXCf4U9OHShR1BwvPOsiP9=5=A4oTZR02qC8weKhU6p6g>