Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Oct 2007 00:49:17 +0100
From:      Deomid Ryabkov <myself@rojer.pp.ru>
To:        freebsd-hackers@freebsd.org
Subject:   Re: 6.2: reproducible hang on amd64, traced to 24h of commits
Message-ID:  <4713FC7D.6070201@rojer.pp.ru>
In-Reply-To: <460D13B0.5070500@rojer.pp.ru>
References:  <460D13B0.5070500@rojer.pp.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms030203010005010808090603
Content-Type: text/plain; charset=KOI8-R; format=flowed
Content-Transfer-Encoding: 7bit

fwiw, i have not traced it down to a commit (got fed up with hangs), but 
conclusively singled out smartmontools as the trigger.
after adding 2 more disks, machine wouldn't even boot up past starting 
smartmontools, locking up hard with the same symptoms.
with smartmontools disabled, it booted up and has been up for > 2 months 
now.

Deomid Ryabkov wrote:
> ok, now that the machine has been up for 10 days, i am reasonably sure 
> i've close enough to this one.
>
> back in january i cvsupped to -STABLE and the box (dual head opteron 
> box) started hanging.
> and i mean it dies completely.
> i have all debug options and a working serial console, but still it 
> just dies and both serial and system console are unresponsive.
> no panic message on either, nothing. pretty sad.
>
> the kernel config is vanilla SMP GENERIC, with all debug options i 
> could think of enabled (after it started hanging).
>
> so the first thing i did after rebooting the box a couple of times is 
> fall back to kernel.old (6.1-STABLE circa august '06).
> no hangs. i then started incrementally updating, gradually getting 
> closer to jan 22.
> long story short, i seem to have isolated the problem to commits made 
> between
> date=2006.12.28.00.00.00 and date=2006.12.29.00.00.00.
> last hang i had was when running the 12/29 kernel, now it's 12/28 and 
> the box has been up for 2 weeks already.
> based on previois experience i'm pretty certain that this is it. with 
> bad kernel the box would never stay up more than a few days, never 
> more than 5.
> between 12/28 and 12/29 i see some changes to /sys/amd64/ and 
> /sys/pci/, which might've be the cause.
> i will probably start looking into individual changes, but if anyone 
> more experienced than me could take a look, it'd be appreciated.
> i am willing to try patches.
> i confirmed that recent (as of 3 weeks or so) -STABLE still has this 
> problem.
>
> thanks in advance.
>
> ====
> files under /sys that were changed between 12/28 and 12/29:
>
> Edit src/sys/amd64/amd64/mptable_pci.c
> Edit src/sys/amd64/pci/pci_bus.c
> Edit src/sys/contrib/dev/ath/public/wackelf.c
> Edit src/sys/dev/acpica/acpi_pci.c
> Edit src/sys/dev/acpica/acpi_pcib_acpi.c
> Edit src/sys/dev/acpica/acpi_pcib_pci.c
> Checkout src/sys/dev/ath/if_ath.c
> Edit src/sys/dev/cardbus/cardbus.c
> Edit src/sys/dev/drm/drm_agpsupport.c
> Edit src/sys/dev/pci/pci.c
> Edit src/sys/dev/pci/pci_if.m
> Edit src/sys/dev/pci/pci_pci.c
> Edit src/sys/dev/pci/pci_private.h
> Edit src/sys/dev/pci/pcib_private.h
> Edit src/sys/dev/pci/pcivar.h
> Edit src/sys/i386/i386/mptable_pci.c
> Edit src/sys/i386/pci/pci_bus.c
> Edit src/sys/kern/subr_bus.c
> Checkout src/sys/netgraph/ng_deflate.h
> Edit src/sys/pci/agp.c
> Edit src/sys/pci/agpreg.h
> Edit src/sys/powerpc/ofw/ofw_pcib_pci.c
> Edit src/sys/sparc64/pci/apb.c
> Edit src/sys/sparc64/pci/ofw_pcib.c
> Edit src/sys/sparc64/pci/ofw_pcibus.c
> Edit src/sys/sys/param.h
>
>
> ====
> kernel configuration used:
>
> include GENERIC
>
> options SMP
>
> options KDB
> options DDB
>
> makeoptions DEBUG=-g
> options INVARIANTS
> options INVARIANT_SUPPORT
> options WITNESS
> options DEBUG_LOCKS
> options DEBUG_VFS_LOCKS
> options DIAGNOSTIC
> ====
>


-- 
Deomid Ryabkov aka Rojer
myself@rojer.pp.ru
rojer@sysadmins.ru
ICQ: 8025844


--------------ms030203010005010808090603
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJPTCC
AvkwggJioAMCAQICEBSsKKL5WVjzKP6XqbFuFxowDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UE
BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMT
I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MDUyNzAxMjM1NloX
DTA4MDUyNjAxMjM1NlowXzEQMA4GA1UEBBMHUnlhYmtvdjEPMA0GA1UEKhMGRGVvbWlkMRcw
FQYDVQQDEw5EZW9taWQgUnlhYmtvdjEhMB8GCSqGSIb3DQEJARYSbXlzZWxmQHJvamVyLnBw
LnJ1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArWqlOZVx3IRUSdA6ZnFp2+Su
bCUBXwtbtI85NhIm45OugjjzcDoO0bcm2UnYalVzBR9zpRPsUyw53+nWphovBP4adrfCaVHX
9tPE3qDH1sLSuz8RNDwu1joU0w7WLYJIhGjPyv0oWBdEcQJ9HKhCVN9UWZJ9HfYHmXqpNNWF
0iidiVNjAcQs3E+1AK4L9PKryLJxCHRvSiviL9qw843jqfT8B1NJ48W82Tqep0O79CAxWKHY
seXwQ294lZxXpNril9bnZ8iVbYhVdFvS3T70mIVP3LrXAjXxIG4vd7n3wsg4uWsOqg/9ChUD
Bw/PwwNcLPckEEqL/uFEpmybdjGngwIDAQABoy8wLTAdBgNVHREEFjAUgRJteXNlbGZAcm9q
ZXIucHAucnUwDAYDVR0TAQH/BAIwADANBgkqhkiG9w0BAQUFAAOBgQAX9ky6qWJikV3SSwmF
j5wG5rq+svRE+Nv6sIF/OgkABrg9To9iUMjVQV1XjEt5AsdxVJWJFhnAGJXDcfV18QKEwdUz
q4RU7aiA4aorOzAXZR+ezF6HZrp0agchh7rcwKJ60EbNZgycrcmPy8UPWjJyn4U6HS4FObr5
q9UB2aHlYDCCAvkwggJioAMCAQICEBSsKKL5WVjzKP6XqbFuFxowDQYJKoZIhvcNAQEFBQAw
YjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4x
LDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MDUy
NzAxMjM1NloXDTA4MDUyNjAxMjM1NlowXzEQMA4GA1UEBBMHUnlhYmtvdjEPMA0GA1UEKhMG
RGVvbWlkMRcwFQYDVQQDEw5EZW9taWQgUnlhYmtvdjEhMB8GCSqGSIb3DQEJARYSbXlzZWxm
QHJvamVyLnBwLnJ1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArWqlOZVx3IRU
SdA6ZnFp2+SubCUBXwtbtI85NhIm45OugjjzcDoO0bcm2UnYalVzBR9zpRPsUyw53+nWphov
BP4adrfCaVHX9tPE3qDH1sLSuz8RNDwu1joU0w7WLYJIhGjPyv0oWBdEcQJ9HKhCVN9UWZJ9
HfYHmXqpNNWF0iidiVNjAcQs3E+1AK4L9PKryLJxCHRvSiviL9qw843jqfT8B1NJ48W82Tqe
p0O79CAxWKHYseXwQ294lZxXpNril9bnZ8iVbYhVdFvS3T70mIVP3LrXAjXxIG4vd7n3wsg4
uWsOqg/9ChUDBw/PwwNcLPckEEqL/uFEpmybdjGngwIDAQABoy8wLTAdBgNVHREEFjAUgRJt
eXNlbGZAcm9qZXIucHAucnUwDAYDVR0TAQH/BAIwADANBgkqhkiG9w0BAQUFAAOBgQAX9ky6
qWJikV3SSwmFj5wG5rq+svRE+Nv6sIF/OgkABrg9To9iUMjVQV1XjEt5AsdxVJWJFhnAGJXD
cfV18QKEwdUzq4RU7aiA4aorOzAXZR+ezF6HZrp0agchh7rcwKJ60EbNZgycrcmPy8UPWjJy
n4U6HS4FObr5q9UB2aHlYDCCAz8wggKooAMCAQICAQ0wDQYJKoZIhvcNAQEFBQAwgdExCzAJ
BgNVBAYTAlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcTCUNhcGUgVG93bjEa
MBgGA1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmljYXRpb24gU2Vy
dmljZXMgRGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBDQTEr
MCkGCSqGSIb3DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhhd3RlLmNvbTAeFw0wMzA3MTcw
MDAwMDBaFw0xMzA3MTYyMzU5NTlaMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUg
Q29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1h
aWwgSXNzdWluZyBDQTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAxKY8VXNV+065ypla
HmjAdQRwnd/p/6Me7L3N9VvyGna9fww6YfK/Uc4B1OVQCjDXAmNaLIkVcI7dyfArhVqqP3FW
y688Cwfn8R+RNiQqE88r1fOCdz0Dviv+uxg+B79AgAJk16emu59l0cUqVIUPSAR/p7bRPGEE
QB5kGXJgt/sCAwEAAaOBlDCBkTASBgNVHRMBAf8ECDAGAQH/AgEAMEMGA1UdHwQ8MDowOKA2
oDSGMmh0dHA6Ly9jcmwudGhhd3RlLmNvbS9UaGF3dGVQZXJzb25hbEZyZWVtYWlsQ0EuY3Js
MAsGA1UdDwQEAwIBBjApBgNVHREEIjAgpB4wHDEaMBgGA1UEAxMRUHJpdmF0ZUxhYmVsMi0x
MzgwDQYJKoZIhvcNAQEFBQADgYEASIzRUIPqCy7MDaNmrGcPf6+svsIXoUOWlJ1/TCG4+DYf
qi2fNi/A9BxQIJNwPP2t4WFiw9k6GX6EsZkbAMUaC4J0niVQlGLH2ydxVyWN3amcOY6MIE9l
X5Xa9/eH1sYITq726jTlEBpbNU1341YheILcIRk13iSx0x1G/11fZU8xggNkMIIDYAIBATB2
MGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQu
MSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIQFKwoovlZ
WPMo/pepsW4XGjAJBgUrDgMCGgUAoIIBwzAYBgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcBMBwG
CSqGSIb3DQEJBTEPFw0wNzEwMTUyMzQ5MTdaMCMGCSqGSIb3DQEJBDEWBBTgCqQSRBTJxGfn
96/hqiYlwza3nDBSBgkqhkiG9w0BCQ8xRTBDMAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIA
gDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCBhQYJKwYBBAGCNxAE
MXgwdjBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkg
THRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECEBSs
KKL5WVjzKP6XqbFuFxowgYcGCyqGSIb3DQEJEAILMXigdjBiMQswCQYDVQQGEwJaQTElMCMG
A1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBl
cnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECEBSsKKL5WVjzKP6XqbFuFxowDQYJKoZIhvcN
AQEBBQAEggEAQ7elxXe9aPA9yKixnt8g1oCLetA4IQt+7auumHqNQxzk3H6thb3S7fl3+Zwi
Iw6Jbpm/qunP96NxB3LaOBg9zMnZsQhtu+icig0/M/nh1SZovfAJt27lOcKMW5GcHJrUIiiZ
3z/t1C9leqcH0vcjDlbx49MOesD6eVYSQdHWvxtKYyxLwylRc7PXYv9ZB8nErDozuqmxEcYB
o/InlksIPZ8A3wulv2I6fha7PSXQ9nrq/fEx7kH5EdBwY9YnSbr6PDlaZNUt9Q01vqth2MqS
QW+QACZTtoRltQ7lVaItCaSyK4PdVaWnRRfjzUqyeUxz+RPja63XEFyCdvEgUz+2OAAAAAAA
AA==
--------------ms030203010005010808090603--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4713FC7D.6070201>