From owner-freebsd-stable@FreeBSD.ORG Mon Dec 12 21:08:27 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6FB6216A420 for ; Mon, 12 Dec 2005 21:08:27 +0000 (GMT) (envelope-from atanas@asd.aplus.net) Received: from pro20.abac.com (pro20.abac.com [66.226.64.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 35BCB43D6A for ; Mon, 12 Dec 2005 21:08:22 +0000 (GMT) (envelope-from atanas@asd.aplus.net) Received: from [216.55.129.41] (asd0.aplus.net [216.55.129.41]) (authenticated bits=0) by pro20.abac.com (8.13.4/8.13.4) with ESMTP id jBCL8KQu069475 for ; Mon, 12 Dec 2005 13:08:20 -0800 (PST) (envelope-from atanas@asd.aplus.net) Message-ID: <439DE88B.1090407@asd.aplus.net> Date: Mon, 12 Dec 2005 13:15:55 -0800 From: Atanas User-Agent: Mozilla Thunderbird 1.0.7 (X11/20051026) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: multipart/mixed; boundary="------------010501090004000608080205" X-Spam-Score: 1.47 (SPF_SOFTFAIL) Subject: 6.0 random freezes X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Dec 2005 21:08:27 -0000 This is a multi-part message in MIME format. --------------010501090004000608080205 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, I have 3 machines running 6.0-RELEASE, and recently 2 of them started freezing once a day or so. There are no error messages on the console or in the system logs. The first one I put in production about a month ago and it was working flawlessly until it got some load and now it started freezing almost every day. The second one has exactly the same behavior - it was fine when doing nothing (a couple of weeks), and started freezing when loaded. The load I'm talking about is less than moderate (less that 2.0 with plenty of CPU idle time). The freezing thing also does not appear to happen at peak times (I have rrdtool based CPU load graphs). Both machines have (almost) identical motherboards: Intel SE7520JR2SCSID2 and SE7520JR2ATAD2 2 Intel XeonE 3.2GHz 800MHz CPUs 4GB DDRII400 RegECC RAM The first one has 8 72GB Ultra320 SCSI drives attached as plain drives (no raid) to the on-board . The second one has 8 500GB SATA2 drives attached to a <3ware Model 9550SX-8LP> controller and configured as a RAID5 array. The motherboards have 2 1000Mbps NICs on board, but due to some (em) driver problems, I usually disable these from BIOS and use a PCI Intel 100Mbps (fxp) instead. Both machines were running 6.0-RELEASE, i386. For the last one I had to updated the twa driver manually, as the one shipped with 6.0 didn't support 3ware 9550SX. I see that new version recently got committed into the -STABLE branches. Here are the diffs against the GENERIC kernel configuration: < cpu I486_CPU < cpu I586_CPU < makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols < options INET6 # IPv6 communications protocols 53d47 < options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI > options QUOTA > options SMP # Symmetric MultiProcessor Kernel /boot/loader.conf: kern.ipc.nmbclusters="65536" /etc/stysctl.conf: kern.ipc.somaxconn=1024 net.inet.tcp.recvspace=16384 net.inet.ip.fw.verbose=1 machdep.hyperthreading_allowed=1 Both machines boot with ACPI and hyperthreading enabled. First I suspected the hardware, so I replaced the entire box (keeping the same drives) - no changes - it got frozen again in less than 24 hours. Then I disabled ACPI (hint.acpi.0.disabled="1") and the hyperthreading - no change - the same thing. Then after reading all related (I believe) postings here and in freebsd-current, I decided to upgrade both boxes to 6.0-STABLE (I saw a lot of changes in the source tree), but the thing continued to happen. I have another machine with the same hardware components (the SCSI based one), but running 5.4-RELEASE. Unlike these two, it's really loaded (even got DDoS-ed a while ago) and I had zero problems with it for months. I remember having similar issues when performing 4GB RAM upgrades on a bunch of 4.x based boxes, when I had to set KVA_PAGES to something like 512. For 5.3+ however this is no longer seems to be an issue. I would provide more useful feedback if I had some real and relevant error messages. Actually I got some unusual errors on only one of the affected servers: Dec 11 02:48:36 xyz kernel: calcru: runtime went backwards from 28636364 usec to 28636021 usec for pid 28588 (httpd) But it does not seem to be much relevant to the problem as it did not happened to be any close to the freezes (i.e. it was 26 hours after the last crash and 19 hours before the next one). Now the only reasonable option for me (I mean for production and in relatively short term) seems going downward to 5.4 and wait until 6.x get more stable Two dmesg.boot files attached. Any comments, suggestions and questions are welcome. Regards, Atanas --------------010501090004000608080205 Content-Type: text/plain; name="dmesg.boot-sata" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="dmesg.boot-sata" Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-STABLE #0: Fri Dec 9 14:54:05 PST 2005 root@xyz:/var/obj/usr/src/sys/XYZ ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3192.01-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf43 Stepping = 3 Features=0xbfebfbff Features2=0x641d> AMD Features=0x20100000 Hyperthreading: 2 logical CPUs real memory = 3757965312 (3583 MB) avail memory = 3678597120 (3508 MB) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP): APIC ID: 7 ioapic0: Changing APIC ID to 8 ioapic1: Changing APIC ID to 9 ioapic2: Changing APIC ID to 10 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard ioapic2 irqs 48-71 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 cpu0: on acpi0 acpi_throttle0: on cpu0 cpu1: on acpi0 acpi_throttle1: on cpu1 acpi_throttle1: failed to attach P_CNT device_attach: acpi_throttle1 attach returned 6 cpu2: on acpi0 acpi_throttle2: on cpu2 acpi_throttle2: failed to attach P_CNT device_attach: acpi_throttle2 attach returned 6 cpu3: on acpi0 acpi_throttle3: on cpu3 acpi_throttle3: failed to attach P_CNT device_attach: acpi_throttle3 attach returned 6 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 0.1 (no driver attached) pci0: at device 1.0 (no driver attached) pcib1: irq 16 at device 2.0 on pci0 pci1: on pcib1 pcib2: at device 0.0 on pci1 pci2: on pcib2 fxp0: port 0xdc00-0xdc3f mem 0xfcffe000-0xfcffefff,0xfcfa0000-0xfcfbffff irq 28 at device 2.0 on pci2 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:0e:0c:9c:47:a8 3ware device driver for 9000 series storage controllers, version: 3.60.02.012 twa0: <3ware 9000 series Storage Controller> port 0xdc80-0xdcbf mem 0xfa000000-0xfbffffff,0xfcfff000-0xfcffffff irq 27 at device 3.0 on pci2 twa0: [FAST] twa0: WARNING: (0x04: 0x0008): Unclean shutdown detected: unit=0 twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-8LP, 8 ports, Firmware FE9X 3.02.00.004, BIOS BE9X 3.01.00.024 pcib3: at device 0.2 on pci1 pci3: on pcib3 pcib4: at device 30.0 on pci0 pci4: on pcib4 pci4: at device 12.0 (no driver attached) isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 31.1 on pci0 ata0: on atapci0 ata1: on atapci0 pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse, device ID 3 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] pmtimer0 on isa0 orm0: at iomem 0xc0000-0xca7ff,0xca800-0xcbfff,0xcc000-0xcd7ff,0xd5000-0xdb7ff on isa0 ppc0: parallel port not found. sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: CDROM at ata0-master UDMA33 twa0: ERROR: (0x03: 0x01d0): Invalid field in parameter list: da0 at twa0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 100.000MB/s transfers da0: 2860962MB (5859250176 512 byte sectors: 255H 63S/T 364721C) SMP: AP CPU #3 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! Trying to mount root from ufs:/dev/da0s1a WARNING: / was not properly dismounted WARNING: /home/u1 was not properly dismounted /home/u1: mount pending error: blocks 888 files 2 WARNING: /home/u2 was not properly dismounted WARNING: /var was not properly dismounted /var: mount pending error: blocks 292 files 1 ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to deny, logging disabled Accounting enabled --------------010501090004000608080205 Content-Type: text/plain; name="dmesg.boot-scsi" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="dmesg.boot-scsi" Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-STABLE #0: Fri Dec 9 11:52:26 PST 2005 root@xyz:/var/obj/usr/src/sys/XYZ ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3192.01-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf43 Stepping = 3 Features=0xbfebfbff Features2=0x641d> AMD Features=0x20100000 Hyperthreading: 2 logical CPUs real memory = 3757965312 (3583 MB) avail memory = 3678597120 (3508 MB) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP): APIC ID: 7 ioapic0: Changing APIC ID to 8 ioapic1: Changing APIC ID to 9 ioapic2: Changing APIC ID to 10 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard ioapic2 irqs 48-71 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 cpu0: on acpi0 acpi_throttle0: on cpu0 cpu1: on acpi0 acpi_throttle1: on cpu1 acpi_throttle1: failed to attach P_CNT device_attach: acpi_throttle1 attach returned 6 cpu2: on acpi0 acpi_throttle2: on cpu2 acpi_throttle2: failed to attach P_CNT device_attach: acpi_throttle2 attach returned 6 cpu3: on acpi0 acpi_throttle3: on cpu3 acpi_throttle3: failed to attach P_CNT device_attach: acpi_throttle3 attach returned 6 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 0.1 (no driver attached) pci0: at device 1.0 (no driver attached) pcib1: irq 16 at device 2.0 on pci0 pci1: on pcib1 pcib2: at device 0.0 on pci1 pci2: on pcib2 fxp0: port 0xd480-0xd4bf mem 0xfcfd7000-0xfcfd7fff,0xfcf80000-0xfcf9ffff irq 27 at device 3.0 on pci2 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:0e:0c:9c:4c:87 mpt0: port 0xd800-0xd8ff mem 0xfcfc0000-0xfcfcffff,0xfcfb0000-0xfcfbffff irq 26 at device 5.0 on pci2 mpt0: [GIANT-LOCKED] mpt0: MPI Version=1.2.14.0 mpt0: Unhandled Event Notify Frame. Event 0xa. mpt0: Capabilities: ( RAID-1E RAID-1 SAFTE ) mpt0: 0 Active Volumes (1 Max) mpt0: 0 Hidden Drive Members (6 Max) mpt1: port 0xdc00-0xdcff mem 0xfcff0000-0xfcffffff,0xfcfe0000-0xfcfeffff irq 25 at device 5.1 on pci2 mpt1: [GIANT-LOCKED] mpt1: MPI Version=1.2.14.0 mpt1: Unhandled Event Notify Frame. Event 0xa. mpt1: Capabilities: ( RAID-1E RAID-1 SAFTE ) mpt1: 0 Active Volumes (1 Max) mpt1: 0 Hidden Drive Members (6 Max) pcib3: at device 0.2 on pci1 pci3: on pcib3 pcib4: at device 30.0 on pci0 pci4: on pcib4 pci4: at device 12.0 (no driver attached) isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 31.1 on pci0 ata0: on atapci0 ata1: on atapci0 atapci1: port 0xcc80-0xcc87,0xcc00-0xcc03,0xc880-0xc887,0xc800-0xc803,0xc480-0xc48f irq 18 at device 31.2 on pci0 atapci1: failed to enable memory mapping! ata2: on atapci1 ata3: on atapci1 pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] pmtimer0 on isa0 orm0: at iomem 0xc0000-0xca7ff,0xca800-0xce7ff on isa0 ppc0: parallel port not found. sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: CDROM at ata0-master UDMA33 ad4: 476940MB at ata2-master SATA150 Waiting 2 seconds for SCSI devices to settle SMP: AP CPU #2 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! da1 at mpt0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-3 device da1: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da1: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da0 at mpt0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da0: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da3 at mpt0 bus 0 target 3 lun 0 da3: Fixed Direct Access SCSI-3 device da3: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da3: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da2 at mpt0 bus 0 target 2 lun 0 da2: Fixed Direct Access SCSI-3 device da2: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da2: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da5 at mpt0 bus 0 target 5 lun 0 da5: Fixed Direct Access SCSI-3 device da5: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da5: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da4 at mpt0 bus 0 target 4 lun 0 da4: Fixed Direct Access SCSI-3 device da4: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da4: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da6 at mpt0 bus 0 target 6 lun 0 da6: Fixed Direct Access SCSI-3 device da6: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da6: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da7 at mpt0 bus 0 target 8 lun 0 da7: Fixed Direct Access SCSI-3 device da7: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da7: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) Trying to mount root from ufs:/dev/da0s1a WARNING: / was not properly dismounted ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to deny, logging disabled Accounting enabled --------------010501090004000608080205--