From owner-freebsd-stable@FreeBSD.ORG Wed Feb 8 21:27:25 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 91AB8106564A for ; Wed, 8 Feb 2012 21:27:25 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.westchester.pa.mail.comcast.net (qmta07.westchester.pa.mail.comcast.net [76.96.62.64]) by mx1.freebsd.org (Postfix) with ESMTP id 3E98C8FC08 for ; Wed, 8 Feb 2012 21:27:24 +0000 (UTC) Received: from omta23.westchester.pa.mail.comcast.net ([76.96.62.74]) by qmta07.westchester.pa.mail.comcast.net with comcast id XY9D1i0391c6gX857ZTRPl; Wed, 08 Feb 2012 21:27:25 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta23.westchester.pa.mail.comcast.net with comcast id XZTQ1i00j1t3BNj3jZTQ6G; Wed, 08 Feb 2012 21:27:25 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 12883102C1E; Wed, 8 Feb 2012 13:27:23 -0800 (PST) Date: Wed, 8 Feb 2012 13:27:23 -0800 From: Jeremy Chadwick To: Mike Tancsa Message-ID: <20120208212723.GA26263@icarus.home.lan> References: <4F32E289.4080806@sentex.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F32E289.4080806@sentex.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: FreeBSD-STABLE Mailing List Subject: Re: siisch1: Error while READ LOG EXT X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Feb 2012 21:27:25 -0000 On Wed, Feb 08, 2012 at 04:00:57PM -0500, Mike Tancsa wrote: > I have a 4 port eSata PCIe card with 3 external port multipliers attached on an AMD64 box (8G of RAM), RELENG8 from Feb1st. > > siis0@pci0:5:0:0: class=0x010400 card=0x71241095 chip=0x31241095 rev=0x02 hdr=0x00 > vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' > device = 'PCI-X to Serial ATA Controller (SiI 3124)' > class = mass storage > subclass = RAID > bar [10] = type Memory, range 64, base 0xb4408000, size 128, enabled > bar [18] = type Memory, range 64, base 0xb4400000, size 32768, enabled > bar [20] = type I/O Port, range 32, base 0x3000, size 16, enabled > cap 01[64] = powerspec 2 supports D0 D1 D2 D3 current D0 > cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions > cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message > > siis0: port 0x3000-0x300f mem 0xb4408000-0xb440807f,0xb4400000-0xb4407fff irq 19 at device 0.0 on pci5 > siis0: [ITHREAD] > siisch0: at channel 0 on siis0 > siisch0: [ITHREAD] > siisch1: at channel 1 on siis0 > siisch1: [ITHREAD] > siisch2: at channel 2 on siis0 > siisch2: [ITHREAD] > siisch3: at channel 3 on siis0 > siisch3: [ITHREAD] > > # camcontrol devlist > at scbus0 target 0 lun 0 (pass0,ada0) > at scbus0 target 1 lun 0 (pass1,ada1) > at scbus0 target 2 lun 0 (pass2,ada2) > at scbus0 target 3 lun 0 (pass3,ada3) > at scbus0 target 15 lun 0 (pass4,pmp1) > at scbus1 target 0 lun 0 (pass5,ada4) > at scbus1 target 1 lun 0 (pass6,ada5) > at scbus1 target 2 lun 0 (pass7,ada6) > at scbus1 target 3 lun 0 (pass8,ada7) > at scbus1 target 4 lun 0 (pass9,ada8) > at scbus1 target 15 lun 0 (pass10,pmp0) > at scbus4 target 0 lun 0 (pass11,da0) > at scbus4 target 0 lun 1 (pass12,da1) > at scbus4 target 16 lun 0 (pass13) > at scbus5 target 0 lun 0 (pass14,da2) > at scbus6 target 0 lun 0 (pass15,ada9) > at scbus7 target 0 lun 0 (pass16,ada10) > at scbus8 target 0 lun 0 (pass17,ada11) > at scbus11 target 0 lun 0 (pass18,ada12) > > > Ever since I added a new PM, I have been seeing a new error (READ LOG EXT) along with a the odd slot timeout error. > > > Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 47000000 > Feb 7 23:49:32 backup3 kernel: siisch1: Timeout on slot 26 > Feb 7 23:49:32 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 43000000 > Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 30 > Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 03000000 > Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 25 > Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 01000000 > Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 24 > Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 This indicates the controller on channel 1 (siisch1) is "stalled" waiting for underlying communication with the device attached to it. > Feb 7 23:57:59 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:13:36 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:21:53 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:22:16 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:39:13 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 01:24:25 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 01:33:52 backup3 last message repeated 2 times > Feb 8 01:43:45 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 01:50:31 backup3 last message repeated 2 times > Feb 8 01:55:20 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 02:26:26 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 02:27:24 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 03:16:28 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 03:36:20 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 04:04:05 backup3 kernel: siisch1: Error while READ LOG EXT This indicates the underlying device was handed a READ LOG EXT ATA command (command 0x2f) and the device did not respond promptly (resulting in the timeout messages you see). > smartctl doesnt show any issues on the drives other than one that has some historical errors from a while ago. What are these errors and do I need to worry about them ? The "READ LOG EXT" ones are new. > > {snipping SMART stats} You're focused heavily on the READ LOG EXT command. READ LOG EXT is intended for accessing the GP Log section of a drive. EXT stands for "Extended". "GP Log" means "General Purpose Log", and is where all sorts of logging information regarding drive performance is stored. It's usually stored within a reserved section of the platters, or in the HPA area. It's not within a "standard" user-accessible LBA/sector region. This is a completely separate log from that of SMART logs. You can review the different types of "logs" on a device by reviewing the ATA8-ACS specification here. See Annex A, section A.1, page 362: http://www.t13.org/documents/UploadedDocuments/docs2007/D1699r4a-ATA8-ACS.pdf This is almost certainly a lower level problem with the disk that cannot be addressed/solved via normal means. Thus, my recommendation is to replace the disk. If you would rather not replace the disk, I can try to step you through looking at the GPLog sections of the disk to see if you can trigger the problem -- and I have a feeling you'll be able to, but I won't necessarily be able to tell you where the actual problem lies hardware-wise, nor will I be able to solve the problem. Regarding the repeated errors at semi-regular (but not entirely) intervals: are you using smartd? Do you have a cronjob that issues smartctl -a or smartctl -x commands at intervals? I imagine any of these could be tickling something lower level. Also, please upgrade your smartmontools to 5.42. It does provide some further enhancements that are useful. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |