Date: Thu, 16 Nov 2006 08:30:17 +0200 From: "Clayton Milos" <clay@milos.co.za> To: "Atanas" <atanas@asd.aplus.net>, "Mark Dotson" <mark@dmglobal.net> Cc: freebsd-stable@freebsd.org Subject: Re: twa: Passthru request timed out! Resetting controller... Message-ID: <01fe01c70948$aeecac70$9603a8c0@claylaptop> References: <455A1DEA.20304@asd.aplus.net> <455A32B7.9080304@dmglobal.net> <455BC7F0.8080203@asd.aplus.net>
next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- From: "Atanas" <atanas@asd.aplus.net> To: "Mark Dotson" <mark@dmglobal.net> Cc: <freebsd-stable@freebsd.org> Sent: Thursday, November 16, 2006 4:07 AM Subject: Re: twa: Passthru request timed out! Resetting controller... > Mark Dotson said the following on 11/14/06 1:18 PM: >> I've had continued problems with the 3ware series SATA cards and the Tyan >> boards. Specifically, I have a "Tyan S5360-1U" and both a 9500S-4LP and >> a 8506 series 3ware cards. >> >> In my case the first error is different, but the 'resetting' over and >> over is VERY familiar. This could be triggered by a simple file copy >> from one part of a container to another; degrading the unit and >> triggering the resetting crap. Note that the drives are fine, I tested >> that first thing. >> >> Sep 8 11:59:23 localhost kernel: 3w-9xxx: scsi0: WARNING: (0x06:0x002C): >> Unit #1: Command (0x2a) timed out, resetting card. >> Sep 8 11:59:41 localhost kernel: 3w-9xxx: scsi0: AEN: INFO >> (0x04:0x005E): >> Cache synchronized after power fail:unit=0. >> Sep 8 11:59:41 localhost kernel: 3w-9xxx: scsi0: AEN: INFO >> (0x04:0x005E): >> Cache synchronized after power fail:unit=1. >> >> I also found this problem to exist across platforms, not just FreeBSD. >> For example, the excerpt above is from a CentOS box. >> >> All tests were done with newest firmware for both card and mobo, and >> using the newest drivers provided by 3ware. >> >> Once I removed the card and drives from the Tyan system and stuck them in >> pretty much ANY other system, they worked fantastically. >> >> I don't have an answer for the "resetting problem" as of yet... 3ware and >> Tyan (And my system vendor "Appro") are still trying to find my specific >> problem and solve it. I believe they are currently doing the "replace >> everything" method of troubleshooting. >> > Mark, thank you. > > It's good to know that the resetting problem exist on other platforms too. > > We already found out that replacing the entire box with identical one > doesn't help, so unfortunately we'll have to start replacing components by > using different brands or models. > > I wouldn't like to touch the I/O subsystem (these are already loaded > production machines), so like you said, the safest bet would be to try > another motherboard. > > However I don't see many Dual Opteron based boards suggested by the > 3ware's compatibility list. The next one that comes in mind from that list > is Supermicro H8DC8, but it looks more like a gamers dream (High-End PCI-e > Graphics, SLI, etc. but no on-board VGA) than a server board. > > I'm quite surprised that the top Opteron based motherboard manufacturer > listed in the 3ware web site motherboard compatibility docs: > http://3ware.com/products/pdf/Motherboard_compatibility_list_9550SX_2006_06.pdf > makes 2 out of 5 boards that are marked as compatible, but perform so bad > with 3ware cards. > > I know what happens here in this mailing list when somebody looks for good > SATA cards (Re: 3ware, 3ware, ...), I replied myself too. > > So are there any success stories with 3ware 9550SX (SATA II) and dual AMD > Opteron server boards, or it's time to go back with Intel? > > Regards, > Atanas It's time to go with another SATA2 raid controller card. I have an Areca 8 port PCI-X cotroller card (www.areca.com.tw). Running it on a Tyan Thunder motherboard with dual AthlonMP and I've had no issues with it yet. I've got 8 drives on it in 2 volumes of 4 drives each. I'm getting what I consider to be good read/write speeds to the array. It also supports many things that 3ware did not at the time I bought it like online volume expansion. homer# dd if=/dev/zero of=test.file bs=65536 count=16384 16384+0 records in 16384+0 records out 1073741824 bytes transferred in 7.000588 secs (153378801 bytes/sec) -Clay > > >> Atanas wrote: >>> Has anyone experiencing this: >>> >>> twa0: ERROR: (0x05: 0x2018): Passthru request timed out!: request = >>> 0xca839d20 >>> twa0: INFO: (0x16: 0x1108): Resetting controller...: >>> twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=0 >>> ... >>> twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=7 >>> twa0: INFO: (0x04: 0x0001): Controller reset occurred: resets=1 >>> twa0: INFO: (0x16: 0x1107): Controller reset done!: >>> >>> This happens on 6.2-PRERELEASE i386 (and on 6.1 since its release) on a >>> number of machines with the following hardware configuration: >>> >>> - Tyan K8SE 2892, 2 AMD Opteron 270 CPUs, 4GB RAM >>> - 3ware 9550SX-8LP, 8 500GB Seagate ST3500641AS SATA drives >>> (configured as 8 SINGLE DISK units, aka JBOD) >>> >>> All hardware components, including the server chassis, are listed in the >>> 3ware hardware compatibility lists. It doesn't seem to be a cabling or >>> power issue. The controller and hard drives are already flashed to the >>> latest firmware revisions. I tried turning off NCQ, but it didn't make >>> any difference. I tried also switching the kernel from PAE to non-PAE >>> (reducing the usable memory to 3GB), but it didn't help either. >>> >>> I have another machines with similar I/O configurations (3ware), but >>> with Intel motherboards and running FreeBSD-5.5, and these run fine for >>> about a year already. Now I'm thinking about swapping the drives between >>> a working Intel and AMD based box, to see where controller timeouts will >>> follow. >>> >>> The problem happens sporadically once in a month or so and is very hard >>> to reproduce. Sometimes it takes several weeks until the next crash >>> happens, sometimes it crashes again in just a few hours. >>> >>> When the thing happens, the kernel sometimes panics (most likely due to >>> the inconsistent filesystem state caused by the controller reset), >>> sometimes just hangs. It can be interrupted (I have a serial console), >>> but the only usable thing after that seems to be "call cpu_reset()", >>> followed by full (and sometimes painfully long) filesystem check. >>> >>> Here are the diffs against the default GENERIC and PAE kernel >>> configurations: >>> >>> < cpu I486_CPU >>> < ident GENERIC >>> < options INET6 # IPv6 communications protocols >>> < options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI >>> >>> > options QUOTA >>> > options SMP # Symmetric MultiProcessor Kernel >>> > options BREAK_TO_DEBUGGER >>> > options DDB >>> > options KDB >>> > options KDB_UNATTENDED >>> >>> > options IPFIREWALL >>> > options DUMMYNET >>> >>> I'm attaching the dmesg.boot following the latest crash. >>> >>> Regards, >>> Atanas >>> > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?01fe01c70948$aeecac70$9603a8c0>