From owner-freebsd-stable@FreeBSD.ORG Wed Feb 8 22:22:53 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC02A10656D0 for ; Wed, 8 Feb 2012 22:22:53 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 33C3A8FC16 for ; Wed, 8 Feb 2012 22:22:52 +0000 (UTC) Received: by eekb47 with SMTP id b47so395266eek.13 for ; Wed, 08 Feb 2012 14:22:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=jGgLq1ocNW/GO/IFLBKIDYKVjkOyZLkymMvR+WNKBfI=; b=AqeN0a54lIp1RRVnvP6LsXqnCODcfRKD7z/Qea6crE+XdaJq/UmI151jkVzZ3m4y+W 98PjfMs79uJ6wldsFwgq24kZs8mfCDzI1k+IPdFlDtlkYAJp0IxgswYxAR/1hgLkLxVd +zoVVFSgHKX27C9OBKZBl39YMefhD8Iral6bU= Received: by 10.14.40.14 with SMTP id e14mr9140711eeb.18.1328739771187; Wed, 08 Feb 2012 14:22:51 -0800 (PST) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226]) by mx.google.com with ESMTPS id c16sm2182152eei.1.2012.02.08.14.22.49 (version=SSLv3 cipher=OTHER); Wed, 08 Feb 2012 14:22:50 -0800 (PST) Sender: Alexander Motin Message-ID: <4F32F5B0.2060203@FreeBSD.org> Date: Thu, 09 Feb 2012 00:22:40 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120116 Thunderbird/9.0 MIME-Version: 1.0 To: Jeremy Chadwick References: <4F32E289.4080806@sentex.net> In-Reply-To: Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: siisch1: Error while READ LOG EXT X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Feb 2012 22:22:53 -0000 On 08.02.2012 23:27, Jeremy Chadwick wrote: > On Wed, Feb 08, 2012 at 04:00:57PM -0500, Mike Tancsa wrote: >> I have a 4 port eSata PCIe card with 3 external port multipliers attached on an AMD64 box (8G of RAM), RELENG8 from Feb1st. >> >> siis0@pci0:5:0:0: class=0x010400 card=0x71241095 chip=0x31241095 rev=0x02 hdr=0x00 >> vendor = 'Silicon Image Inc (Was: CMD Technology Inc)' >> device = 'PCI-X to Serial ATA Controller (SiI 3124)' >> class = mass storage >> subclass = RAID >> bar [10] = type Memory, range 64, base 0xb4408000, size 128, enabled >> bar [18] = type Memory, range 64, base 0xb4400000, size 32768, enabled >> bar [20] = type I/O Port, range 32, base 0x3000, size 16, enabled >> cap 01[64] = powerspec 2 supports D0 D1 D2 D3 current D0 >> cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions >> cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message >> >> siis0: port 0x3000-0x300f mem 0xb4408000-0xb440807f,0xb4400000-0xb4407fff irq 19 at device 0.0 on pci5 >> siis0: [ITHREAD] >> siisch0: at channel 0 on siis0 >> siisch0: [ITHREAD] >> siisch1: at channel 1 on siis0 >> siisch1: [ITHREAD] >> siisch2: at channel 2 on siis0 >> siisch2: [ITHREAD] >> siisch3: at channel 3 on siis0 >> siisch3: [ITHREAD] >> >> # camcontrol devlist >> at scbus0 target 0 lun 0 (pass0,ada0) >> at scbus0 target 1 lun 0 (pass1,ada1) >> at scbus0 target 2 lun 0 (pass2,ada2) >> at scbus0 target 3 lun 0 (pass3,ada3) >> at scbus0 target 15 lun 0 (pass4,pmp1) >> at scbus1 target 0 lun 0 (pass5,ada4) >> at scbus1 target 1 lun 0 (pass6,ada5) >> at scbus1 target 2 lun 0 (pass7,ada6) >> at scbus1 target 3 lun 0 (pass8,ada7) >> at scbus1 target 4 lun 0 (pass9,ada8) >> at scbus1 target 15 lun 0 (pass10,pmp0) >> at scbus4 target 0 lun 0 (pass11,da0) >> at scbus4 target 0 lun 1 (pass12,da1) >> at scbus4 target 16 lun 0 (pass13) >> at scbus5 target 0 lun 0 (pass14,da2) >> at scbus6 target 0 lun 0 (pass15,ada9) >> at scbus7 target 0 lun 0 (pass16,ada10) >> at scbus8 target 0 lun 0 (pass17,ada11) >> at scbus11 target 0 lun 0 (pass18,ada12) >> >> >> Ever since I added a new PM, I have been seeing a new error (READ LOG EXT) along with a the odd slot timeout error. >> >> >> Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 47000000 >> Feb 7 23:49:32 backup3 kernel: siisch1: Timeout on slot 26 >> Feb 7 23:49:32 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 >> Feb 7 23:49:32 backup3 kernel: siisch1: ... waiting for slots 43000000 >> Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 30 >> Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 >> Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 03000000 >> Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 25 >> Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 >> Feb 7 23:49:34 backup3 kernel: siisch1: ... waiting for slots 01000000 >> Feb 7 23:49:34 backup3 kernel: siisch1: Timeout on slot 24 >> Feb 7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000 > > This indicates the controller on channel 1 (siisch1) is "stalled" > waiting for underlying communication with the device attached to it. > >> Feb 7 23:57:59 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 00:13:36 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 00:21:53 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 00:22:16 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 00:39:13 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 01:24:25 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 01:33:52 backup3 last message repeated 2 times >> Feb 8 01:43:45 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 01:50:31 backup3 last message repeated 2 times >> Feb 8 01:55:20 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 02:26:26 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 02:27:24 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 03:16:28 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 03:36:20 backup3 kernel: siisch1: Error while READ LOG EXT >> Feb 8 04:04:05 backup3 kernel: siisch1: Error while READ LOG EXT > > This indicates the underlying device was handed a READ LOG EXT ATA > command (command 0x2f) and the device did not respond promptly > (resulting in the timeout messages you see). There are hours between timeouts and READ LOG EXT errors. they are not directly related, but may have the same reason. >> smartctl doesnt show any issues on the drives other than one that has some historical errors from a while ago. What are these errors and do I need to worry about them ? The "READ LOG EXT" ones are new. >> >> {snipping SMART stats} > > You're focused heavily on the READ LOG EXT command. READ LOG EXT is > intended for accessing the GP Log section of a drive. EXT stands for > "Extended". "GP Log" means "General Purpose Log", and is where all > sorts of logging information regarding drive performance is stored. > It's usually stored within a reserved section of the platters, or in the > HPA area. It's not within a "standard" user-accessible LBA/sector > region. This is a completely separate log from that of SMART logs. READ LOG EXT commands here used to fetch status of some failed NCQ commands. It is normal (the only) way to get detailed error status in that case. Error of the READ LOG EXT commands may mean that it is not regular media error, but may be problem with communication, firmware or something else. > You can review the different types of "logs" on a device by reviewing > the ATA8-ACS specification here. See Annex A, section A.1, page 362: > > http://www.t13.org/documents/UploadedDocuments/docs2007/D1699r4a-ATA8-ACS.pdf > > This is almost certainly a lower level problem with the disk that cannot > be addressed/solved via normal means. Thus, my recommendation is to > replace the disk. > > If you would rather not replace the disk, I can try to step you through > looking at the GPLog sections of the disk to see if you can trigger the > problem -- and I have a feeling you'll be able to, but I won't > necessarily be able to tell you where the actual problem lies > hardware-wise, nor will I be able to solve the problem. > > Regarding the repeated errors at semi-regular (but not entirely) > intervals: are you using smartd? Do you have a cronjob that issues > smartctl -a or smartctl -x commands at intervals? I imagine any of > these could be tickling something lower level. > > Also, please upgrade your smartmontools to 5.42. It does provide some > further enhancements that are useful. -- Alexander Motin