Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Mar 2007 14:42:08 +0100
From:      Deomid Ryabkov <myself@rojer.pp.ru>
To:        freebsd-hackers@freebsd.org
Subject:   6.2: reproducible hang on amd64, traced to 24h of commits
Message-ID:  <460D13B0.5070500@rojer.pp.ru>

next in thread | raw e-mail | index | archive | help
ok, now that the machine has been up for 10 days, i am reasonably sure 
i've close enough to this one.

back in january i cvsupped to -STABLE and the box (dual head opteron 
box) started hanging.
and i mean it dies completely.
i have all debug options and a working serial console, but still it just 
dies and both serial and system console are unresponsive.
no panic message on either, nothing. pretty sad.

the kernel config is vanilla SMP GENERIC, with all debug options i could 
think of enabled (after it started hanging).

so the first thing i did after rebooting the box a couple of times is 
fall back to kernel.old (6.1-STABLE circa august '06).
no hangs. i then started incrementally updating, gradually getting 
closer to jan 22.
long story short, i seem to have isolated the problem to commits made 
between
date=2006.12.28.00.00.00 and date=2006.12.29.00.00.00.
last hang i had was when running the 12/29 kernel, now it's 12/28 and 
the box has been up for 2 weeks already.
based on previois experience i'm pretty certain that this is it. with 
bad kernel the box would never stay up more than a few days, never more 
than 5.
between 12/28 and 12/29 i see some changes to /sys/amd64/ and /sys/pci/, 
which might've be the cause.
i will probably start looking into individual changes, but if anyone 
more experienced than me could take a look, it'd be appreciated.
i am willing to try patches.
i confirmed that recent (as of 3 weeks or so) -STABLE still has this 
problem.

thanks in advance.

====
files under /sys that were changed between 12/28 and 12/29:

 Edit src/sys/amd64/amd64/mptable_pci.c
 Edit src/sys/amd64/pci/pci_bus.c
 Edit src/sys/contrib/dev/ath/public/wackelf.c
 Edit src/sys/dev/acpica/acpi_pci.c
 Edit src/sys/dev/acpica/acpi_pcib_acpi.c
 Edit src/sys/dev/acpica/acpi_pcib_pci.c
 Checkout src/sys/dev/ath/if_ath.c
 Edit src/sys/dev/cardbus/cardbus.c
 Edit src/sys/dev/drm/drm_agpsupport.c
 Edit src/sys/dev/pci/pci.c
 Edit src/sys/dev/pci/pci_if.m
 Edit src/sys/dev/pci/pci_pci.c
 Edit src/sys/dev/pci/pci_private.h
 Edit src/sys/dev/pci/pcib_private.h
 Edit src/sys/dev/pci/pcivar.h
 Edit src/sys/i386/i386/mptable_pci.c
 Edit src/sys/i386/pci/pci_bus.c
 Edit src/sys/kern/subr_bus.c
 Checkout src/sys/netgraph/ng_deflate.h
 Edit src/sys/pci/agp.c
 Edit src/sys/pci/agpreg.h
 Edit src/sys/powerpc/ofw/ofw_pcib_pci.c
 Edit src/sys/sparc64/pci/apb.c
 Edit src/sys/sparc64/pci/ofw_pcib.c
 Edit src/sys/sparc64/pci/ofw_pcibus.c
 Edit src/sys/sys/param.h


====
kernel configuration used:

include GENERIC

options SMP

options KDB
options DDB

makeoptions DEBUG=-g
options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS
options DEBUG_LOCKS
options DEBUG_VFS_LOCKS
options DIAGNOSTIC
====

-- 
Deomid Ryabkov aka Rojer
myself@rojer.pp.ru
rojer@sysadmins.ru
ICQ: 8025844




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?460D13B0.5070500>