Date: Wed, 8 Feb 2012 13:27:23 -0800 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Mike Tancsa <mike@sentex.net> Cc: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org> Subject: Re: siisch1: Error while READ LOG EXT Message-ID: <20120208212723.GA26263@icarus.home.lan> In-Reply-To: <4F32E289.4080806@sentex.net> References: <4F32E289.4080806@sentex.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Feb 08, 2012 at 04:00:57PM -0500, Mike Tancsa wrote: > I have a 4 port eSata PCIe card with 3 external port multipliers attached on an AMD64 box (8G of RAM), RELENG8 from Feb1st. > > siis0@pci0:5:0:0: class=0x010400 card=0x71241095 chip=0x31241095 rev=0x02 hdr=0x00 > vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' > device = 'PCI-X to Serial ATA Controller (SiI 3124)' > class = mass storage > subclass = RAID > bar [10] = type Memory, range 64, base 0xb4408000, size 128, enabled > bar [18] = type Memory, range 64, base 0xb4400000, size 32768, enabled > bar [20] = type I/O Port, range 32, base 0x3000, size 16, enabled > cap 01[64] = powerspec 2 supports D0 D1 D2 D3 current D0 > cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions > cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message > > siis0: <SiI3124 SATA controller> port 0x3000-0x300f mem 0xb4408000-0xb440807f,0xb4400000-0xb4407fff irq 19 at device 0.0 on pci5 > siis0: [ITHREAD] > siisch0: <SIIS channel> at channel 0 on siis0 > siisch0: [ITHREAD] > siisch1: <SIIS channel> at channel 1 on siis0 > siisch1: [ITHREAD] > siisch2: <SIIS channel> at channel 2 on siis0 > siisch2: [ITHREAD] > siisch3: <SIIS channel> at channel 3 on siis0 > siisch3: [ITHREAD] > > # camcontrol devlist > <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 0 lun 0 (pass0,ada0) > <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 1 lun 0 (pass1,ada1) > <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 2 lun 0 (pass2,ada2) > <WDC WD2001FASS-00U0B0 01.00101> at scbus0 target 3 lun 0 (pass3,ada3) > <Port Multiplier 47261095 1f06> at scbus0 target 15 lun 0 (pass4,pmp1) > <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 0 lun 0 (pass5,ada4) > <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 1 lun 0 (pass6,ada5) > <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 2 lun 0 (pass7,ada6) > <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 3 lun 0 (pass8,ada7) > <WDC WD2002FAEX-007BA0 05.01D05> at scbus1 target 4 lun 0 (pass9,ada8) > <Port Multiplier 37261095 1706> at scbus1 target 15 lun 0 (pass10,pmp0) > <Areca usrvar R001> at scbus4 target 0 lun 0 (pass11,da0) > <Areca backup1 R001> at scbus4 target 0 lun 1 (pass12,da1) > <Areca RAID controller R001> at scbus4 target 16 lun 0 (pass13) > <AMCC 9650SE-2LP DISK 4.10> at scbus5 target 0 lun 0 (pass14,da2) > <ST31000333AS SD35> at scbus6 target 0 lun 0 (pass15,ada9) > <ST31000528AS CC35> at scbus7 target 0 lun 0 (pass16,ada10) > <ST31000340AS SD1A> at scbus8 target 0 lun 0 (pass17,ada11) > <WDC WD1002FAEX-00Z3A0 05.01D05> at scbus11 target 0 lun 0 (pass18,ada12) > > > Ever since I added a new PM, I have been seeing a new error (READ LOG EXT) along with a the odd slot timeout error. > > > Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 47000000 > Feb 7 23:49:32 backup3 kernel: siisch1: Timeout on slot 26 > Feb 7 23:49:32 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 43000000 > Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 30 > Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 03000000 > Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 25 > Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 01000000 > Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 24 > Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 This indicates the controller on channel 1 (siisch1) is "stalled" waiting for underlying communication with the device attached to it. > Feb 7 23:57:59 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:13:36 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:21:53 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:22:16 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 00:39:13 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 01:24:25 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 01:33:52 backup3 last message repeated 2 times > Feb 8 01:43:45 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 01:50:31 backup3 last message repeated 2 times > Feb 8 01:55:20 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 02:26:26 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 02:27:24 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 03:16:28 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 03:36:20 backup3 kernel: siisch1: Error while READ LOG EXT > Feb 8 04:04:05 backup3 kernel: siisch1: Error while READ LOG EXT This indicates the underlying device was handed a READ LOG EXT ATA command (command 0x2f) and the device did not respond promptly (resulting in the timeout messages you see). > smartctl doesnt show any issues on the drives other than one that has some historical errors from a while ago. What are these errors and do I need to worry about them ? The "READ LOG EXT" ones are new. > > {snipping SMART stats} You're focused heavily on the READ LOG EXT command. READ LOG EXT is intended for accessing the GP Log section of a drive. EXT stands for "Extended". "GP Log" means "General Purpose Log", and is where all sorts of logging information regarding drive performance is stored. It's usually stored within a reserved section of the platters, or in the HPA area. It's not within a "standard" user-accessible LBA/sector region. This is a completely separate log from that of SMART logs. You can review the different types of "logs" on a device by reviewing the ATA8-ACS specification here. See Annex A, section A.1, page 362: http://www.t13.org/documents/UploadedDocuments/docs2007/D1699r4a-ATA8-ACS.pdf This is almost certainly a lower level problem with the disk that cannot be addressed/solved via normal means. Thus, my recommendation is to replace the disk. If you would rather not replace the disk, I can try to step you through looking at the GPLog sections of the disk to see if you can trigger the problem -- and I have a feeling you'll be able to, but I won't necessarily be able to tell you where the actual problem lies hardware-wise, nor will I be able to solve the problem. Regarding the repeated errors at semi-regular (but not entirely) intervals: are you using smartd? Do you have a cronjob that issues smartctl -a or smartctl -x commands at intervals? I imagine any of these could be tickling something lower level. Also, please upgrade your smartmontools to 5.42. It does provide some further enhancements that are useful. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120208212723.GA26263>