Date: Tue, 16 Oct 2007 00:49:17 +0100 From: Deomid Ryabkov <myself@rojer.pp.ru> To: freebsd-hackers@freebsd.org Subject: Re: 6.2: reproducible hang on amd64, traced to 24h of commits Message-ID: <4713FC7D.6070201@rojer.pp.ru> In-Reply-To: <460D13B0.5070500@rojer.pp.ru>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] fwiw, i have not traced it down to a commit (got fed up with hangs), but conclusively singled out smartmontools as the trigger. after adding 2 more disks, machine wouldn't even boot up past starting smartmontools, locking up hard with the same symptoms. with smartmontools disabled, it booted up and has been up for > 2 months now. Deomid Ryabkov wrote: > ok, now that the machine has been up for 10 days, i am reasonably sure > i've close enough to this one. > > back in january i cvsupped to -STABLE and the box (dual head opteron > box) started hanging. > and i mean it dies completely. > i have all debug options and a working serial console, but still it > just dies and both serial and system console are unresponsive. > no panic message on either, nothing. pretty sad. > > the kernel config is vanilla SMP GENERIC, with all debug options i > could think of enabled (after it started hanging). > > so the first thing i did after rebooting the box a couple of times is > fall back to kernel.old (6.1-STABLE circa august '06). > no hangs. i then started incrementally updating, gradually getting > closer to jan 22. > long story short, i seem to have isolated the problem to commits made > between > date=2006.12.28.00.00.00 and date=2006.12.29.00.00.00. > last hang i had was when running the 12/29 kernel, now it's 12/28 and > the box has been up for 2 weeks already. > based on previois experience i'm pretty certain that this is it. with > bad kernel the box would never stay up more than a few days, never > more than 5. > between 12/28 and 12/29 i see some changes to /sys/amd64/ and > /sys/pci/, which might've be the cause. > i will probably start looking into individual changes, but if anyone > more experienced than me could take a look, it'd be appreciated. > i am willing to try patches. > i confirmed that recent (as of 3 weeks or so) -STABLE still has this > problem. > > thanks in advance. > > ==== > files under /sys that were changed between 12/28 and 12/29: > > Edit src/sys/amd64/amd64/mptable_pci.c > Edit src/sys/amd64/pci/pci_bus.c > Edit src/sys/contrib/dev/ath/public/wackelf.c > Edit src/sys/dev/acpica/acpi_pci.c > Edit src/sys/dev/acpica/acpi_pcib_acpi.c > Edit src/sys/dev/acpica/acpi_pcib_pci.c > Checkout src/sys/dev/ath/if_ath.c > Edit src/sys/dev/cardbus/cardbus.c > Edit src/sys/dev/drm/drm_agpsupport.c > Edit src/sys/dev/pci/pci.c > Edit src/sys/dev/pci/pci_if.m > Edit src/sys/dev/pci/pci_pci.c > Edit src/sys/dev/pci/pci_private.h > Edit src/sys/dev/pci/pcib_private.h > Edit src/sys/dev/pci/pcivar.h > Edit src/sys/i386/i386/mptable_pci.c > Edit src/sys/i386/pci/pci_bus.c > Edit src/sys/kern/subr_bus.c > Checkout src/sys/netgraph/ng_deflate.h > Edit src/sys/pci/agp.c > Edit src/sys/pci/agpreg.h > Edit src/sys/powerpc/ofw/ofw_pcib_pci.c > Edit src/sys/sparc64/pci/apb.c > Edit src/sys/sparc64/pci/ofw_pcib.c > Edit src/sys/sparc64/pci/ofw_pcibus.c > Edit src/sys/sys/param.h > > > ==== > kernel configuration used: > > include GENERIC > > options SMP > > options KDB > options DDB > > makeoptions DEBUG=-g > options INVARIANTS > options INVARIANT_SUPPORT > options WITNESS > options DEBUG_LOCKS > options DEBUG_VFS_LOCKS > options DIAGNOSTIC > ==== > -- Deomid Ryabkov aka Rojer myself@rojer.pp.ru rojer@sysadmins.ru ICQ: 8025844 [-- Attachment #2 --] 0 *H 010 + 0 *H =00b(YX(n0 *H 0b10 UZA1%0#U Thawte Consulting (Pty) Ltd.1,0*U#Thawte Personal Freemail Issuing CA0 070527012356Z 080526012356Z0_10URyabkov10 U*Deomid10UDeomid Ryabkov1!0 *H myself@rojer.pp.ru0"0 *H 0 j9q܄TI:fqil%_[96&㓮8p:ѷ&IjUssS,9֦/viQޠһ?4<.:-Hh(XDq}BTTY}z4Յ(Sc,O ȲqtoJ+/ڰSIż:C 1XرCoxWgȕmUt[>Oܺ5 n/w8k \,$JDlv1 /0-0U0myself@rojer.pp.ru0U0 0 *H Lbb]K 溾D: =NbPA]WKyqTqu3T+;0e^ftj!zFfɏZ2r:.9١`00b(YX(n0 *H 0b10 UZA1%0#U Thawte Consulting (Pty) Ltd.1,0*U#Thawte Personal Freemail Issuing CA0 070527012356Z 080526012356Z0_10URyabkov10 U*Deomid10UDeomid Ryabkov1!0 *H myself@rojer.pp.ru0"0 *H 0 j9q܄TI:fqil%_[96&㓮8p:ѷ&IjUssS,9֦/viQޠһ?4<.:-Hh(XDq}BTTY}z4Յ(Sc,O ȲqtoJ+/ڰSIż:C 1XرCoxWgȕmUt[>Oܺ5 n/w8k \,$JDlv1 /0-0U0myself@rojer.pp.ru0U0 0 *H Lbb]K 溾D: =NbPA]WKyqTqu3T+;0e^ftj!zFfɏZ2r:.9١`0?0 0 *H 010 UZA10UWestern Cape10U Cape Town10U Thawte Consulting1(0&UCertification Services Division1$0"UThawte Personal Freemail CA1+0) *H personal-freemail@thawte.com0 030717000000Z 130716235959Z0b10 UZA1%0#U Thawte Consulting (Pty) Ltd.1,0*U#Thawte Personal Freemail Issuing CA00 *H 0 Ħ<UsUNʙZhup[v:aQP 0cZ,p+Z?qV˯<6$*+w=+>@dקe*TH<a@dr` 00U0 0CU<0:08642http://crl.thawte.com/ThawtePersonalFreemailCA.crl0U0)U"0 010UPrivateLabel2-1380 *H HP. fgCL!6-6/P p<ab:~ t%Pb'qW%ݩ9 Oe_N4[5MwV!x!5$F]_eO1d0`0v0b10 UZA1%0#U Thawte Consulting (Pty) Ltd.1,0*U#Thawte Personal Freemail Issuing CA(YX(n0 + 0 *H 1 *H 0 *H 1 071015234917Z0# *H 1 Dg&%60R *H 1E0C0 *H 0*H 0 *H @0+0 *H (0 +71x0v0b10 UZA1%0#U Thawte Consulting (Pty) Ltd.1,0*U#Thawte Personal Freemail Issuing CA(YX(n0*H 1xv0b10 UZA1%0#U Thawte Consulting (Pty) Ltd.1,0*U#Thawte Personal Freemail Issuing CA(YX(n0 *H Cwh=Ȩ րz8!~zC~w"#nqr8=ٱm蜊 ?3&h n9["(?/ez#VzyVAֿJc,K)QsbYĬ:3'K= b:~=%z1Apc'I<9Zd- 5aʒAo &SeU- +UEJyLsk\v S?8home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4713FC7D.6070201>
