Date: Mon, 02 Mar 2009 08:11:07 -0600 From: Guy Helmer <ghelmer@palisadesys.com> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: 7.1 hangs in cache_lookup mutex? Message-ID: <49ABE8FB.3060202@palisadesys.com> In-Reply-To: <49A80A55.5070004@palisadesys.com> References: <49A46AB4.3080003@palisadesys.com> <200902261648.32845.jhb@freebsd.org> <49A7173B.4030608@palisadesys.com> <200902261753.29607.jhb@freebsd.org> <49A80A55.5070004@palisadesys.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Guy Helmer wrote: > John Baldwin wrote: >> On Thursday 26 February 2009 5:27:07 pm Guy Helmer wrote: >> >>> John Baldwin wrote: >>> >>>> On Thursday 26 February 2009 4:22:15 pm Guy Helmer wrote: >>>> >>>>> db> show sleepchain 23110 >>>>> thread 100181 (pid 23110, vmstat) blocked on sx "user map" XLOCK >>>>> thread 100208 (pid 23092, kvoop) is on a run queue >>>>> db> show sleepchain 23092 >>>>> thread 100208 (pid 23092, kvoop) is on a run queue >>>>> >>>> Ah, so this is normal (well, mostly) in that kvoop is simply on the >>>> run >> queue >>>> waiting for a CPU. Can you find the thread pointer for kvoop and >>>> check on things such as if it is pinned and if so to which CPU >>>> (td_pinned will tell you the first, and td_sched->ts_cpu will tell >>>> you the second with ULE). >>>> >>> (kgdb) print td->td_pinned >>> $2 = 0 >>> >> >> Ok, not pinned. >> >> >>> From my captured ddb run: >>> cpuid = 3 >>> curthread = 0xc5e2f000: pid 23090 "filter" >>> curpcb = 0xe6f90d90 >>> fpcurthread = none >>> idlethread = 0xc442daf0: pid 11 "idle: cpu3" >>> APIC ID = 7 >>> currentldt = 0x50 >>> spin locks held: >>> >> >> At http://www.freebsd.org/~jhb/gdb/ you can find my kgdb scripts. If >> you source gdb6 you can run 'runtds' which will show you what each >> CPU is doing (more or less) in ps-style output. >> >> >>> I sure wish I could find the root cause of the hangs. On a hunch, I >>> tried setting "machdep.cpu_idle_hlt=0" on the amd64 machine, and it >>> has run 32 hours without a hang. It could just be coincidence, >>> though... >>> >> >> Ahhh, that actually could explain it perhaps. Do your CPUs support >> C2 or higher sleep states for idle? You can try limiting it to only >> C1 (or disable C1E in your BIOS if it has an option for that) to see >> if that fixes it. >> >> > I don't think the CPUs support anything lower than C1 - there is no > hw.acpi.cpu.cx_supported sysctl node, and hw.cpi.cpu.cx_lowest is C1. > C1-Enhanced was already disabled in the BIOS, at least on the machine > running amd64. 48 hours of runtime, and no hangs seen yet. I did > reboot it this morning to check the sleep settings in the BIOS. Despite having machdep.cpu_idle_hlt=0, the machine wedged for 40 hours over the weekend but came back to life by itself. Could this be lost IPIs, or a bug in the scheduler? Guy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49ABE8FB.3060202>