From owner-freebsd-stable@FreeBSD.ORG Tue Nov 14 19:50:16 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8790016A416 for ; Tue, 14 Nov 2006 19:50:16 +0000 (UTC) (envelope-from atanas@asd.aplus.net) Received: from pro20.abac.com (pro20.abac.com [66.226.64.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 562D443D98 for ; Tue, 14 Nov 2006 19:50:05 +0000 (GMT) (envelope-from atanas@asd.aplus.net) Received: from [216.55.129.232] ([216.55.129.232]) (authenticated bits=0) by pro20.abac.com (8.13.8/8.13.8) with ESMTP id kAEJnuHS070096 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 14 Nov 2006 11:49:57 -0800 (PST) (envelope-from atanas@asd.aplus.net) Message-ID: <455A1DEA.20304@asd.aplus.net> Date: Tue, 14 Nov 2006 11:50:02 -0800 From: Atanas User-Agent: Thunderbird 1.5.0.8 (Macintosh/20061025) MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: multipart/mixed; boundary="------------090304060004050009080503" X-Spam-Score: 1.47 (SPF_SOFTFAIL) Subject: twa: Passthru request timed out! Resetting controller... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Nov 2006 19:50:16 -0000 This is a multi-part message in MIME format. --------------090304060004050009080503 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Has anyone experiencing this: twa0: ERROR: (0x05: 0x2018): Passthru request timed out!: request = 0xca839d20 twa0: INFO: (0x16: 0x1108): Resetting controller...: twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=0 ... twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=7 twa0: INFO: (0x04: 0x0001): Controller reset occurred: resets=1 twa0: INFO: (0x16: 0x1107): Controller reset done!: This happens on 6.2-PRERELEASE i386 (and on 6.1 since its release) on a number of machines with the following hardware configuration: - Tyan K8SE 2892, 2 AMD Opteron 270 CPUs, 4GB RAM - 3ware 9550SX-8LP, 8 500GB Seagate ST3500641AS SATA drives (configured as 8 SINGLE DISK units, aka JBOD) All hardware components, including the server chassis, are listed in the 3ware hardware compatibility lists. It doesn't seem to be a cabling or power issue. The controller and hard drives are already flashed to the latest firmware revisions. I tried turning off NCQ, but it didn't make any difference. I tried also switching the kernel from PAE to non-PAE (reducing the usable memory to 3GB), but it didn't help either. I have another machines with similar I/O configurations (3ware), but with Intel motherboards and running FreeBSD-5.5, and these run fine for about a year already. Now I'm thinking about swapping the drives between a working Intel and AMD based box, to see where controller timeouts will follow. The problem happens sporadically once in a month or so and is very hard to reproduce. Sometimes it takes several weeks until the next crash happens, sometimes it crashes again in just a few hours. When the thing happens, the kernel sometimes panics (most likely due to the inconsistent filesystem state caused by the controller reset), sometimes just hangs. It can be interrupted (I have a serial console), but the only usable thing after that seems to be "call cpu_reset()", followed by full (and sometimes painfully long) filesystem check. Here are the diffs against the default GENERIC and PAE kernel configurations: < cpu I486_CPU < ident GENERIC < options INET6 # IPv6 communications protocols < options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI > options QUOTA > options SMP # Symmetric MultiProcessor Kernel > options BREAK_TO_DEBUGGER > options DDB > options KDB > options KDB_UNATTENDED > options IPFIREWALL > options DUMMYNET I'm attaching the dmesg.boot following the latest crash. Regards, Atanas --------------090304060004050009080503 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="dmesg.boot" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="dmesg.boot" Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-PRERELEASE #0: Mon Nov 13 17:47:40 PST 2006 root@xyz:/var/obj/usr/src/sys/XYZ-PAE Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Dual Core AMD Opteron(tm) Processor 270 (2009.27-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x20f12 Stepping = 2 Features=0x178bfbff Features2=0x1 AMD Features=0xe2500800 AMD Features2=0x3 Cores per package: 2 real memory = 5368709120 (5120 MB) avail memory = 4182241280 (3988 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-27 on motherboard ioapic2 irqs 28-31 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 0.0 (no driver attached) isab0: at device 1.0 on pci0 isa0: on isab0 pci0: at device 1.1 (no driver attached) pci0: at device 2.0 (no driver attached) atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1400-0x140f at device 6.0 on pci0 ata0: on atapci0 ata1: on atapci0 pcib1: at device 9.0 on pci0 pci1: on pcib1 pci1: at device 6.0 (no driver attached) fxp0: port 0x2400-0x243f mem 0xda101000-0xda101fff,0xda120000-0xda13ffff irq 16 at device 8.0 on pci1 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:e0:81:33:b5:f1 pcib2: at device 13.0 on pci0 pci2: on pcib2 pcib3: at device 14.0 on pci0 pci3: on pcib3 pcib4: port 0xcf8-0xcff on acpi0 pci24: on pcib4 pcib5: at device 10.0 on pci24 pci25: on pcib5 3ware device driver for 9000 series storage controllers, version: 3.60.02.012 twa0: <3ware 9000 series Storage Controller> port 0x3000-0x303f mem 0xde000000-0xdfffffff,0xdc300000-0xdc300fff irq 27 at device 3.0 on pci25 twa0: [FAST] twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-8LP, 8 ports, Firmware FE9X 3.04.01.011, BIOS BE9X 3.04.00.002 pci24: at device 10.1 (no driver attached) pcib6: at device 11.0 on pci24 pci26: on pcib6 bge0: mem 0xdc410000-0xdc41ffff,0xdc400000-0xdc40ffff irq 28 at device 9.0 on pci26 miibus1: on bge0 brgphy0: on miibus1 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge0: Ethernet address: 00:e0:81:33:b6:f4 bge1: mem 0xdc430000-0xdc43ffff,0xdc420000-0xdc42ffff irq 29 at device 9.1 on pci26 miibus2: on bge1 brgphy1: on miibus2 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge1: Ethernet address: 00:e0:81:33:b6:f5 pci24: at device 11.1 (no driver attached) atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A, console fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xc97ff on isa0 ppc0: parallel port not found. sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to deny, logging disabled da0 at twa0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 100.000MB/s transfers da0: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da1 at twa0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-3 device da1: 100.000MB/s transfers da1: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da2 at twa0 bus 0 target 2 lun 0 da2: Fixed Direct Access SCSI-3 device da2: 100.000MB/s transfers da2: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da3 at twa0 bus 0 target 3 lun 0 da3: Fixed Direct Access SCSI-3 device da3: 100.000MB/s transfers da3: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da4 at twa0 bus 0 target 4 lun 0 da4: Fixed Direct Access SCSI-3 device da4: 100.000MB/s transfers da4: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da5 at twa0 bus 0 target 5 lun 0 da5: Fixed Direct Access SCSI-3 device da5: 100.000MB/s transfers da5: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da6 at twa0 bus 0 target 6 lun 0 da6: Fixed Direct Access SCSI-3 device da6: 100.000MB/s transfers da6: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da7 at twa0 bus 0 target 7 lun 0 da7: Fixed Direct Access SCSI-3 device da7: 100.000MB/s transfers da7: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #3 Launched! Trying to mount root from ufs:/dev/da0s1a WARNING: / was not properly dismounted /: mount pending error: blocks 208 files 5 --------------090304060004050009080503--