From owner-freebsd-hackers@FreeBSD.ORG Tue Oct 16 00:08:23 2007 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1624816A469 for ; Tue, 16 Oct 2007 00:08:23 +0000 (UTC) (envelope-from myself@rojer.pp.ru) Received: from wooster.rojer.pp.ru (wooster.rojer.pp.ru [80.68.242.188]) by mx1.freebsd.org (Postfix) with ESMTP id 56B1513C46B for ; Tue, 16 Oct 2007 00:08:22 +0000 (UTC) (envelope-from myself@rojer.pp.ru) Received: from wooster.rojer.pp.ru (localhost [127.0.0.1]) by wooster.rojer.pp.ru (Postfix) with ESMTP id 38578117B6 for ; Tue, 16 Oct 2007 03:49:22 +0400 (MSD) X-Spam-Checker-Version: SpamAssassin 3.2.3-rojer (2007-08-08) on wooster.rojer.pp.ru X-Spam-Level: X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.2.3-rojer Received: from nb.rojer.pp.ru (localhost [127.0.0.1]) by wooster.rojer.pp.ru (Postfix) with ESMTP for ; Tue, 16 Oct 2007 03:49:17 +0400 (MSD) Message-ID: <4713FC7D.6070201@rojer.pp.ru> Date: Tue, 16 Oct 2007 00:49:17 +0100 From: Deomid Ryabkov User-Agent: Thunderbird 2.0.0.6 (X11/20070823) MIME-Version: 1.0 To: freebsd-hackers@freebsd.org References: <460D13B0.5070500@rojer.pp.ru> In-Reply-To: <460D13B0.5070500@rojer.pp.ru> Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg=sha1; boundary="------------ms030203010005010808090603" Subject: Re: 6.2: reproducible hang on amd64, traced to 24h of commits X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Oct 2007 00:08:23 -0000 This is a cryptographically signed message in MIME format. --------------ms030203010005010808090603 Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit fwiw, i have not traced it down to a commit (got fed up with hangs), but conclusively singled out smartmontools as the trigger. after adding 2 more disks, machine wouldn't even boot up past starting smartmontools, locking up hard with the same symptoms. with smartmontools disabled, it booted up and has been up for > 2 months now. Deomid Ryabkov wrote: > ok, now that the machine has been up for 10 days, i am reasonably sure > i've close enough to this one. > > back in january i cvsupped to -STABLE and the box (dual head opteron > box) started hanging. > and i mean it dies completely. > i have all debug options and a working serial console, but still it > just dies and both serial and system console are unresponsive. > no panic message on either, nothing. pretty sad. > > the kernel config is vanilla SMP GENERIC, with all debug options i > could think of enabled (after it started hanging). > > so the first thing i did after rebooting the box a couple of times is > fall back to kernel.old (6.1-STABLE circa august '06). > no hangs. i then started incrementally updating, gradually getting > closer to jan 22. > long story short, i seem to have isolated the problem to commits made > between > date=2006.12.28.00.00.00 and date=2006.12.29.00.00.00. > last hang i had was when running the 12/29 kernel, now it's 12/28 and > the box has been up for 2 weeks already. > based on previois experience i'm pretty certain that this is it. with > bad kernel the box would never stay up more than a few days, never > more than 5. > between 12/28 and 12/29 i see some changes to /sys/amd64/ and > /sys/pci/, which might've be the cause. > i will probably start looking into individual changes, but if anyone > more experienced than me could take a look, it'd be appreciated. > i am willing to try patches. > i confirmed that recent (as of 3 weeks or so) -STABLE still has this > problem. > > thanks in advance. > > ==== > files under /sys that were changed between 12/28 and 12/29: > > Edit src/sys/amd64/amd64/mptable_pci.c > Edit src/sys/amd64/pci/pci_bus.c > Edit src/sys/contrib/dev/ath/public/wackelf.c > Edit src/sys/dev/acpica/acpi_pci.c > Edit src/sys/dev/acpica/acpi_pcib_acpi.c > Edit src/sys/dev/acpica/acpi_pcib_pci.c > Checkout src/sys/dev/ath/if_ath.c > Edit src/sys/dev/cardbus/cardbus.c > Edit src/sys/dev/drm/drm_agpsupport.c > Edit src/sys/dev/pci/pci.c > Edit src/sys/dev/pci/pci_if.m > Edit src/sys/dev/pci/pci_pci.c > Edit src/sys/dev/pci/pci_private.h > Edit src/sys/dev/pci/pcib_private.h > Edit src/sys/dev/pci/pcivar.h > Edit src/sys/i386/i386/mptable_pci.c > Edit src/sys/i386/pci/pci_bus.c > Edit src/sys/kern/subr_bus.c > Checkout src/sys/netgraph/ng_deflate.h > Edit src/sys/pci/agp.c > Edit src/sys/pci/agpreg.h > Edit src/sys/powerpc/ofw/ofw_pcib_pci.c > Edit src/sys/sparc64/pci/apb.c > Edit src/sys/sparc64/pci/ofw_pcib.c > Edit src/sys/sparc64/pci/ofw_pcibus.c > Edit src/sys/sys/param.h > > > ==== > kernel configuration used: > > include GENERIC > > options SMP > > options KDB > options DDB > > makeoptions DEBUG=-g > options INVARIANTS > options INVARIANT_SUPPORT > options WITNESS > options DEBUG_LOCKS > options DEBUG_VFS_LOCKS > options DIAGNOSTIC > ==== > -- Deomid Ryabkov aka Rojer myself@rojer.pp.ru rojer@sysadmins.ru ICQ: 8025844 --------------ms030203010005010808090603 Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJPTCC AvkwggJioAMCAQICEBSsKKL5WVjzKP6XqbFuFxowDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UE BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMT I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MDUyNzAxMjM1NloX DTA4MDUyNjAxMjM1NlowXzEQMA4GA1UEBBMHUnlhYmtvdjEPMA0GA1UEKhMGRGVvbWlkMRcw FQYDVQQDEw5EZW9taWQgUnlhYmtvdjEhMB8GCSqGSIb3DQEJARYSbXlzZWxmQHJvamVyLnBw LnJ1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArWqlOZVx3IRUSdA6ZnFp2+Su bCUBXwtbtI85NhIm45OugjjzcDoO0bcm2UnYalVzBR9zpRPsUyw53+nWphovBP4adrfCaVHX 9tPE3qDH1sLSuz8RNDwu1joU0w7WLYJIhGjPyv0oWBdEcQJ9HKhCVN9UWZJ9HfYHmXqpNNWF 0iidiVNjAcQs3E+1AK4L9PKryLJxCHRvSiviL9qw843jqfT8B1NJ48W82Tqep0O79CAxWKHY seXwQ294lZxXpNril9bnZ8iVbYhVdFvS3T70mIVP3LrXAjXxIG4vd7n3wsg4uWsOqg/9ChUD Bw/PwwNcLPckEEqL/uFEpmybdjGngwIDAQABoy8wLTAdBgNVHREEFjAUgRJteXNlbGZAcm9q ZXIucHAucnUwDAYDVR0TAQH/BAIwADANBgkqhkiG9w0BAQUFAAOBgQAX9ky6qWJikV3SSwmF j5wG5rq+svRE+Nv6sIF/OgkABrg9To9iUMjVQV1XjEt5AsdxVJWJFhnAGJXDcfV18QKEwdUz q4RU7aiA4aorOzAXZR+ezF6HZrp0agchh7rcwKJ60EbNZgycrcmPy8UPWjJyn4U6HS4FObr5 q9UB2aHlYDCCAvkwggJioAMCAQICEBSsKKL5WVjzKP6XqbFuFxowDQYJKoZIhvcNAQEFBQAw YjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4x LDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA3MDUy NzAxMjM1NloXDTA4MDUyNjAxMjM1NlowXzEQMA4GA1UEBBMHUnlhYmtvdjEPMA0GA1UEKhMG RGVvbWlkMRcwFQYDVQQDEw5EZW9taWQgUnlhYmtvdjEhMB8GCSqGSIb3DQEJARYSbXlzZWxm QHJvamVyLnBwLnJ1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArWqlOZVx3IRU SdA6ZnFp2+SubCUBXwtbtI85NhIm45OugjjzcDoO0bcm2UnYalVzBR9zpRPsUyw53+nWphov BP4adrfCaVHX9tPE3qDH1sLSuz8RNDwu1joU0w7WLYJIhGjPyv0oWBdEcQJ9HKhCVN9UWZJ9 HfYHmXqpNNWF0iidiVNjAcQs3E+1AK4L9PKryLJxCHRvSiviL9qw843jqfT8B1NJ48W82Tqe p0O79CAxWKHYseXwQ294lZxXpNril9bnZ8iVbYhVdFvS3T70mIVP3LrXAjXxIG4vd7n3wsg4 uWsOqg/9ChUDBw/PwwNcLPckEEqL/uFEpmybdjGngwIDAQABoy8wLTAdBgNVHREEFjAUgRJt eXNlbGZAcm9qZXIucHAucnUwDAYDVR0TAQH/BAIwADANBgkqhkiG9w0BAQUFAAOBgQAX9ky6 qWJikV3SSwmFj5wG5rq+svRE+Nv6sIF/OgkABrg9To9iUMjVQV1XjEt5AsdxVJWJFhnAGJXD cfV18QKEwdUzq4RU7aiA4aorOzAXZR+ezF6HZrp0agchh7rcwKJ60EbNZgycrcmPy8UPWjJy n4U6HS4FObr5q9UB2aHlYDCCAz8wggKooAMCAQICAQ0wDQYJKoZIhvcNAQEFBQAwgdExCzAJ BgNVBAYTAlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcTCUNhcGUgVG93bjEa MBgGA1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmljYXRpb24gU2Vy dmljZXMgRGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBDQTEr MCkGCSqGSIb3DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhhd3RlLmNvbTAeFw0wMzA3MTcw MDAwMDBaFw0xMzA3MTYyMzU5NTlaMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUg Q29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1h aWwgSXNzdWluZyBDQTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAxKY8VXNV+065ypla HmjAdQRwnd/p/6Me7L3N9VvyGna9fww6YfK/Uc4B1OVQCjDXAmNaLIkVcI7dyfArhVqqP3FW y688Cwfn8R+RNiQqE88r1fOCdz0Dviv+uxg+B79AgAJk16emu59l0cUqVIUPSAR/p7bRPGEE QB5kGXJgt/sCAwEAAaOBlDCBkTASBgNVHRMBAf8ECDAGAQH/AgEAMEMGA1UdHwQ8MDowOKA2 oDSGMmh0dHA6Ly9jcmwudGhhd3RlLmNvbS9UaGF3dGVQZXJzb25hbEZyZWVtYWlsQ0EuY3Js MAsGA1UdDwQEAwIBBjApBgNVHREEIjAgpB4wHDEaMBgGA1UEAxMRUHJpdmF0ZUxhYmVsMi0x MzgwDQYJKoZIhvcNAQEFBQADgYEASIzRUIPqCy7MDaNmrGcPf6+svsIXoUOWlJ1/TCG4+DYf qi2fNi/A9BxQIJNwPP2t4WFiw9k6GX6EsZkbAMUaC4J0niVQlGLH2ydxVyWN3amcOY6MIE9l X5Xa9/eH1sYITq726jTlEBpbNU1341YheILcIRk13iSx0x1G/11fZU8xggNkMIIDYAIBATB2 MGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQu MSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIQFKwoovlZ WPMo/pepsW4XGjAJBgUrDgMCGgUAoIIBwzAYBgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcBMBwG CSqGSIb3DQEJBTEPFw0wNzEwMTUyMzQ5MTdaMCMGCSqGSIb3DQEJBDEWBBTgCqQSRBTJxGfn 96/hqiYlwza3nDBSBgkqhkiG9w0BCQ8xRTBDMAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIA gDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDCBhQYJKwYBBAGCNxAE MXgwdjBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkg THRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECEBSs KKL5WVjzKP6XqbFuFxowgYcGCyqGSIb3DQEJEAILMXigdjBiMQswCQYDVQQGEwJaQTElMCMG A1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBl cnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECEBSsKKL5WVjzKP6XqbFuFxowDQYJKoZIhvcN AQEBBQAEggEAQ7elxXe9aPA9yKixnt8g1oCLetA4IQt+7auumHqNQxzk3H6thb3S7fl3+Zwi Iw6Jbpm/qunP96NxB3LaOBg9zMnZsQhtu+icig0/M/nh1SZovfAJt27lOcKMW5GcHJrUIiiZ 3z/t1C9leqcH0vcjDlbx49MOesD6eVYSQdHWvxtKYyxLwylRc7PXYv9ZB8nErDozuqmxEcYB o/InlksIPZ8A3wulv2I6fha7PSXQ9nrq/fEx7kH5EdBwY9YnSbr6PDlaZNUt9Q01vqth2MqS QW+QACZTtoRltQ7lVaItCaSyK4PdVaWnRRfjzUqyeUxz+RPja63XEFyCdvEgUz+2OAAAAAAA AA== --------------ms030203010005010808090603--