From owner-freebsd-stable@FreeBSD.ORG Mon Dec 27 21:52:49 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4DFC416A4CE for ; Mon, 27 Dec 2004 21:52:49 +0000 (GMT) Received: from ceres.aros.net (ceres.aros.net [66.219.192.41]) by mx1.FreeBSD.org (Postfix) with ESMTP id 120F743D5E for ; Mon, 27 Dec 2004 21:52:49 +0000 (GMT) (envelope-from tbowman@aros.net) Received: from gargamel.private.aros.net (firebat.aros.net [66.219.192.36]) (authenticated bits=0) by ceres.aros.net (8.13.1/8.13.1) with ESMTP id iBRLqjIw098432 for ; Mon, 27 Dec 2004 14:52:48 -0700 (MST) (envelope-from tbowman@aros.net) From: Troy Bowman To: freebsd-stable@freebsd.org Content-Type: multipart/signed; micalg=sha1; protocol="application/x-pkcs7-signature"; boundary="=-n/g/LRBv/Bpbe5WyhHLb" Date: Mon, 27 Dec 2004 14:52:45 -0700 Message-Id: <1104184365.16903.29.camel@gargamel> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 X-Virus-Scanned: by amavisd-new Subject: am64/FreeBSD-5.3-STABLE (or RELEASE) crashes often X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Dec 2004 21:52:49 -0000 --=-n/g/LRBv/Bpbe5WyhHLb Content-Type: text/plain Content-Transfer-Encoding: quoted-printable And it doesn't dump its core to its dump swap space, too, so I can't run savecore after reboot to get debugging info. I have the swap space in fstab commented out so it won't come up at boot to be able to manually harvest the core, as it gives "savecore: no dumps found." (it doesn't happen automatically, either).=20 We recently thought we'd give 5.3 a go in production, and it has been too unstable. When it crashes, it doesn't reboot, so it just hangs there until someone has to drive in and push the button. Who knows, maybe Linux would be more stable at this point. Sigh. Hardware that it is running on is a Tyan s2875 with dual amd64/246 processors, and 2 GB Registered DDR RAM (Corsair). We're also running vinum for all of the filesystems, mirroring them all, including the root filesystem. The vinum is using two SATA WD Raptors. I have one older IDE drive plugged in to capture the kernel dumps. =20 We've tried many different memory configurations to see if we can tune it so that FreeBSD can handle it (DRAM ECC vs master ECC, bank & node interleaving turned off/on, slowing the memory down, DRAM Scrub Redirect off/on, etc, to no avail. It's usually pagedaemon that croaks, but it crashes on the keyboard irq process and serial IO irq process for some reason also. I guess since it's usually the pager that dies, that's the reason why I can't get kernel dumps. Here are some (manually copied) panics from the console. Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 00 fault virtual address =3D 0x88 fault code =3D supervisor read, page not present instruction pointer =3D 0x8:0xffffffff80389aea stack pointer =3D 0x10:0xffffffffb2051a60 frame pointer =3D 0x10:0xffffff006b12d000 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 53 (pagedaemon) trap number =3D 12 panic: page fault cpuid =3D 0 boot() called on cpu#0 Uptime: 10h18m49s ... Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 00 fault virtual address =3D 0x88 fault code =3D supervisor read, page not present instruction pointer =3D 0x8:0xffffffff8038a10a frame pointer =3D 0x10:0xffffffffb2051ab0 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 53 (pagedaemon) trap number =3D 12 panic: page fault cpuid =3D 0 boot() called on cpu#0 Uptime: 15h59m55s ... =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D resumek IOPL =3D 0 current process =3D 36 (swi5: clock sio) trap number =3D 12 panic: page fault cpuid =3D 1 kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid =3D 1; apic id =3D 01 fault virtual address =3D 0x48 fault code =3D supervisor read, page not present instruction pointer =3D 0x8: 0xffffffff803a40d3 stack pointer =3D 0x10: 0xffffffffb1d63650 frame pointer =3D 0x10: 0xffffff007b7f3a40 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0,pres 1, long 1, def32 0, gran 1 processor eflags =3D resume, IOPL =3D 0 current process =3D 30 trap number =3D 12 panic: page fault cpuid =3D 1 spin lock sched lock held by 0xffffff007b8177b0 for > 5 seconds ... What can I do to debug this more if I can't harvest the kernel dumps to report a bug? Is there anything the FreeBSD team can do? Do I need to resort to Linux for dual amd64 support for now? Thanks, ../troy --=-n/g/LRBv/Bpbe5WyhHLb Content-Type: application/x-pkcs7-signature; name=smime.p7s Content-Disposition: attachment; filename=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJxTCCAz0w ggKmoAMCAQICAwxm8TANBgkqhkiG9w0BAQQFADBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhh d3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVt YWlsIElzc3VpbmcgQ0EwHhcNMDQwNTI4MTgzODQ5WhcNMDUwNTI4MTgzODQ5WjB9MQ8wDQYDVQQE EwZCb3dtYW4xEDAOBgNVBCoTB00uIFRyb3kxFzAVBgNVBAMTDk0uIFRyb3kgQm93bWFuMR4wHAYJ KoZIhvcNAQkBFg90cm95QGR1Ymxhbi5uZXQxHzAdBgkqhkiG9w0BCQEWEHRib3dtYW5AYXJvcy5u ZXQwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQD7Y9ghMVs2OMIXd10BzwQVktILwTrg 3pLP0gRcqRnXnlyaj3JarkFuG5231bgI48UtZLlk1afqKcO33SbVZwf/Ky036fLQoCR/911rIvWR bi+AIEZtOgXEx7+qrkPV9RjxJT0PkuBIlCsHCBj15HMAaapIf0hTU0dUuzJV+JSQ4VpYW5fj67Ht VjD47IzROtixxBJO8eEfFC8s38lQ3W+kIbZmTFzEYWHDFfuPDz316YPWynNbAbI5vyZ9oMO6btpw 2ji/VIkzx+y2gjb7UuMc+ORDXYOQCtjmjsbvhJ49oETeMT/YgsS8t6W4NHZw3UWMyP9HWuZ92HQ4 Yvz4rhHBAgMBAAGjYjBgMA8GA1UdDwEB/wQFAwMH+YAwEQYJYIZIAYb4QgEBBAQDAgWgMCwGA1Ud EQQlMCOBD3Ryb3lAZHVibGFuLm5ldIEQdGJvd21hbkBhcm9zLm5ldDAMBgNVHRMBAf8EAjAAMA0G CSqGSIb3DQEBBAUAA4GBAGXuN0RhaLnR7Um49UYSsOe8+ROvsvtJbm18ua55RhwyMfo3LgYUMIk3 02d6L5SnH+3+pQrg3zNQtHQT+0YCdBWLdjwHoD4pwn5FqdPo0KKqn6Uaci8GXYKqpD+o/fvDIKqt UefpM/mqHGW2lNMaW52n05XARYFdFB8tsYVHkl6vMIIDPTCCAqagAwIBAgIDDGbxMA0GCSqGSIb3 DQEBBAUAMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5KSBM dGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQTAeFw0wNDA1 MjgxODM4NDlaFw0wNTA1MjgxODM4NDlaMH0xDzANBgNVBAQTBkJvd21hbjEQMA4GA1UEKhMHTS4g VHJveTEXMBUGA1UEAxMOTS4gVHJveSBCb3dtYW4xHjAcBgkqhkiG9w0BCQEWD3Ryb3lAZHVibGFu Lm5ldDEfMB0GCSqGSIb3DQEJARYQdGJvd21hbkBhcm9zLm5ldDCCASIwDQYJKoZIhvcNAQEBBQAD ggEPADCCAQoCggEBAPtj2CExWzY4whd3XQHPBBWS0gvBOuDeks/SBFypGdeeXJqPclquQW4bnbfV uAjjxS1kuWTVp+opw7fdJtVnB/8rLTfp8tCgJH/3XWsi9ZFuL4AgRm06BcTHv6quQ9X1GPElPQ+S 4EiUKwcIGPXkcwBpqkh/SFNTR1S7MlX4lJDhWlhbl+Prse1WMPjsjNE62LHEEk7x4R8ULyzfyVDd b6QhtmZMXMRhYcMV+48PPfXpg9bKc1sBsjm/Jn2gw7pu2nDaOL9UiTPH7LaCNvtS4xz45ENdg5AK 2OaOxu+Enj2gRN4xP9iCxLy3pbg0dnDdRYzI/0da5n3YdDhi/PiuEcECAwEAAaNiMGAwDwYDVR0P AQH/BAUDAwf5gDARBglghkgBhvhCAQEEBAMCBaAwLAYDVR0RBCUwI4EPdHJveUBkdWJsYW4ubmV0 gRB0Ym93bWFuQGFyb3MubmV0MAwGA1UdEwEB/wQCMAAwDQYJKoZIhvcNAQEEBQADgYEAZe43RGFo udHtSbj1RhKw57z5E6+y+0lubXy5rnlGHDIx+jcuBhQwiTfTZ3ovlKcf7f6lCuDfM1C0dBP7RgJ0 FYt2PAegPinCfkWp0+jQoqqfpRpyLwZdgqqkP6j9+8Mgqq1R5+kz+aocZbaU0xpbnafTlcBFgV0U Hy2xhUeSXq8wggM/MIICqKADAgECAgENMA0GCSqGSIb3DQEBBQUAMIHRMQswCQYDVQQGEwJaQTEV MBMGA1UECBMMV2VzdGVybiBDYXBlMRIwEAYDVQQHEwlDYXBlIFRvd24xGjAYBgNVBAoTEVRoYXd0 ZSBDb25zdWx0aW5nMSgwJgYDVQQLEx9DZXJ0aWZpY2F0aW9uIFNlcnZpY2VzIERpdmlzaW9uMSQw IgYDVQQDExtUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgQ0ExKzApBgkqhkiG9w0BCQEWHHBlcnNv bmFsLWZyZWVtYWlsQHRoYXd0ZS5jb20wHhcNMDMwNzE3MDAwMDAwWhcNMTMwNzE2MjM1OTU5WjBi MQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoG A1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0EwgZ8wDQYJKoZIhvcNAQEB BQADgY0AMIGJAoGBAMSmPFVzVftOucqZWh5owHUEcJ3f6f+jHuy9zfVb8hp2vX8MOmHyv1HOAdTl UAow1wJjWiyJFXCO3cnwK4Vaqj9xVsuvPAsH5/EfkTYkKhPPK9Xzgnc9A74r/rsYPge/QIACZNen prufZdHFKlSFD0gEf6e20TxhBEAeZBlyYLf7AgMBAAGjgZQwgZEwEgYDVR0TAQH/BAgwBgEB/wIB ADBDBgNVHR8EPDA6MDigNqA0hjJodHRwOi8vY3JsLnRoYXd0ZS5jb20vVGhhd3RlUGVyc29uYWxG cmVlbWFpbENBLmNybDALBgNVHQ8EBAMCAQYwKQYDVR0RBCIwIKQeMBwxGjAYBgNVBAMTEVByaXZh dGVMYWJlbDItMTM4MA0GCSqGSIb3DQEBBQUAA4GBAEiM0VCD6gsuzA2jZqxnD3+vrL7CF6FDlpSd f0whuPg2H6otnzYvwPQcUCCTcDz9reFhYsPZOhl+hLGZGwDFGguCdJ4lUJRix9sncVcljd2pnDmO jCBPZV+V2vf3h9bGCE6u9uo05RAaWzVNd+NWIXiC3CEZNd4ksdMdRv9dX2VPMYIC5zCCAuMCAQEw aTBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEs MCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECAwxm8TAJBgUrDgMC GgUAoIIBUzAYBgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0wNDEyMjcy MTUyNDVaMCMGCSqGSIb3DQEJBDEWBBSP00YpqqRByELQ0vI45jMBoSCvijB4BgkrBgEEAYI3EAQx azBpMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQu MSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIDDGbxMHoGCyqG SIb3DQEJEAILMWugaTBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcg KFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0EC Awxm8TANBgkqhkiG9w0BAQEFAASCAQD3yJSEp74gttbvctfDp9N7HL9QjyeJdXuQtV2FpYCyu6o2 tCHuuMv3QINDhh6Sr0zQlVugqrpOyF7Zq0RLUYivBBxJtvGpr0/FOsbXbQPqZfEBIOcuCUwGLFPo 6KFROqw/e67YbRFBpICtbMBW7+76LQXOmzirM6+gQccTcbulJLw/T7ao6e1rbqrOep8EqM5J73Yd TaL5Watfs31YiC281MmPCpPhbSeE9vmzH4SQ3WkgWFHliYsJItflpL+xmtDcIuBmb0OpwIGJuilj hTI3tz69YezsS6DJZ3RBPvKywkxeiG6+vZi24clFdZAQ1/mx96KXA2h20KCOvkBtF0ucAAAAAAAA --=-n/g/LRBv/Bpbe5WyhHLb--