Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Apr 2012 02:22:33 -0400
From:      Arnaud Lacombe <lacombar@gmail.com>
To:        freebsd-stable <freebsd-stable@freebsd.org>,  FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: Complete hang on 9.0-RELEASE
Message-ID:  <CACqU3MWSyr_toZcOvQrNpLxX=ytNyDfDxpVKxxhC3%2BBACO6HPw@mail.gmail.com>
In-Reply-To: <CACqU3MWx5S-v4jya2JEtT6d=9TOXcyR_Do8yybBY8%2Bkg16HpxA@mail.gmail.com>
References:  <CACqU3MUefo4mG3GdZnj6kxxFx4H_M3-NLys8pCKptqNU4r_ywA@mail.gmail.com> <CACqU3MWx5S-v4jya2JEtT6d=9TOXcyR_Do8yybBY8%2Bkg16HpxA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

On Mon, Apr 16, 2012 at 5:50 PM, Arnaud Lacombe <lacombar@gmail.com> wrote:
> Hi,
>
> [for the record...]
>
> On Tue, Feb 14, 2012 at 11:41 AM, Arnaud Lacombe <lacombar@gmail.com> wro=
te:
>> Hi folks,
>>
>> For the records, I was running some tests yesterday on top of a
>> 9.0-RELEASE, amd64, kernel when the box hanged. At the time of the
>> hang, the box was running a process with about 2800 threads with heavy
>> IPC between 1400 writers and 1400 readers. The box was in single user
>> mode (/bin/sh coming from FreeBSD 7.4-STABLE). Here is the beginning
>> of the dmesg:
>>
>> Copyright (c) 1992-2012 The FreeBSD Project.
>> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>> =A0 =A0 =A0 =A0The Regents of the University of California. All rights r=
eserved.
>> FreeBSD is a registered trademark of The FreeBSD Foundation.
>> FreeBSD 9.0-RELEASE #0: Tue Jan =A03 07:46:30 UTC 2012
>> =A0 =A0root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
>> CPU: Intel(R) Atom(TM) CPU D510 =A0 @ 1.66GHz (1666.70-MHz K8-class CPU)
>> =A0Origin =3D "GenuineIntel" =A0Id =3D 0x106ca =A0Family =3D 6 =A0Model =
=3D 1c =A0Stepping =3D 10
>> =A0Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTR=
R,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>> =A0Features2=3D0x40e31d<SSE3,DTES64,MON,DS_CPL,TM2,SSSE3,CX16,xTPR,PDCM,=
MOVBE>
>> =A0AMD Features=3D0x20000800<SYSCALL,LM>
>> =A0AMD Features2=3D0x1<LAHF>
>> =A0TSC: P-state invariant, performance statistics
>> real memory =A0=3D 2137587712 (2038 MB)
>> avail memory =3D 2037841920 (1943 MB)
>> Event timer "LAPIC" quality 400
>> ACPI APIC Table: <070611 APIC1125>
>> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
>> FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 HTT threads
>> =A0cpu0 (BSP): APIC ID: =A00
>> =A0cpu1 (AP/HT): APIC ID: =A01
>> =A0cpu2 (AP): APIC ID: =A02
>> =A0cpu3 (AP/HT): APIC ID: =A03
>>
>> I will restart the test and see if this happens again.
>>
> I reproduced the previous problem on 10-CURRENT from r233917, on the
> following platform (here running 8.2-RELEASE):
>
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011
> =A0 =A0root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: Intel(R) Atom(TM) CPU D525 =A0 @ 1.80GHz (1800.01-MHz K8-class CPU)
> =A0Origin =3D "GenuineIntel" =A0Id =3D 0x106ca =A0Family =3D 6 =A0Model =
=3D 1c =A0Stepping =3D 10
> =A0Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR=
,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> =A0Features2=3D0x40e31d<SSE3,DTES64,MON,DS_CPL,TM2,SSSE3,CX16,xTPR,PDCM,M=
OVBE>
> =A0AMD Features=3D0x20100800<SYSCALL,NX,LM>
> =A0AMD Features2=3D0x1<LAHF>
> =A0TSC: P-state invariant
> real memory =A0=3D 2136539136 (2037 MB)
> avail memory =3D 2043772928 (1949 MB)
> ACPI APIC Table: <010312 APIC0947>
> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 HTT threads
> =A0cpu0 (BSP): APIC ID: =A00
> =A0cpu1 (AP/HT): APIC ID: =A01
> =A0cpu2 (AP): APIC ID: =A02
> =A0cpu3 (AP/HT): APIC ID: =A03
>
> Complete system freeze while running about 2400 threads. I had to
> power cycle the system to get it back alive. I discussed a way to
> debug this with attilio@ on freebsd-stable@, but still did not had
> time to implement it.
>
10-CURRENT from r233917 hanged again today while running 3600 threads.
I enabled WITNESS and INVARIANTS on that specific kernel, secretly
hoping that they would trigger some meaningful information, but they
did not. I would guess my last attempt is to enable SW_WATCHDOG, and
gather some state information out of DDB when the watchdog trigger, if
it does...

Btw, this issue seems to be specifically happening on Atom/ICH8M
platform running amd64 kernel, as I've never seen it on other
platforms, and yet ran extensive tests. I am not entirely sure it
happens on i386. I would need to check.

 - Arnaud



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACqU3MWSyr_toZcOvQrNpLxX=ytNyDfDxpVKxxhC3%2BBACO6HPw>