From owner-freebsd-stable@FreeBSD.ORG  Wed Feb  8 22:22:53 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AC02A10656D0
	for <freebsd-stable@freebsd.org>; Wed,  8 Feb 2012 22:22:53 +0000 (UTC)
	(envelope-from mavbsd@gmail.com)
Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 33C3A8FC16
	for <freebsd-stable@freebsd.org>; Wed,  8 Feb 2012 22:22:52 +0000 (UTC)
Received: by eekb47 with SMTP id b47so395266eek.13
	for <freebsd-stable@freebsd.org>; Wed, 08 Feb 2012 14:22:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=jGgLq1ocNW/GO/IFLBKIDYKVjkOyZLkymMvR+WNKBfI=;
	b=AqeN0a54lIp1RRVnvP6LsXqnCODcfRKD7z/Qea6crE+XdaJq/UmI151jkVzZ3m4y+W
	98PjfMs79uJ6wldsFwgq24kZs8mfCDzI1k+IPdFlDtlkYAJp0IxgswYxAR/1hgLkLxVd
	+zoVVFSgHKX27C9OBKZBl39YMefhD8Iral6bU=
Received: by 10.14.40.14 with SMTP id e14mr9140711eeb.18.1328739771187;
	Wed, 08 Feb 2012 14:22:51 -0800 (PST)
Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226])
	by mx.google.com with ESMTPS id c16sm2182152eei.1.2012.02.08.14.22.49
	(version=SSLv3 cipher=OTHER); Wed, 08 Feb 2012 14:22:50 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <4F32F5B0.2060203@FreeBSD.org>
Date: Thu, 09 Feb 2012 00:22:40 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:9.0) Gecko/20120116 Thunderbird/9.0
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4F32E289.4080806@sentex.net>
	<mailpost.1328736521.3202974.81071.mailing.freebsd.stable@FreeBSD.cs.nctu.edu.tw>
In-Reply-To: <mailpost.1328736521.3202974.81071.mailing.freebsd.stable@FreeBSD.cs.nctu.edu.tw>
Content-Type: text/plain; charset=KOI8-R; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@freebsd.org
Subject: Re: siisch1: Error while READ LOG EXT
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Feb 2012 22:22:53 -0000

On 08.02.2012 23:27, Jeremy Chadwick wrote:
> On Wed, Feb 08, 2012 at 04:00:57PM -0500, Mike Tancsa wrote:
>> I have a 4 port eSata PCIe card with 3 external port multipliers attached on an AMD64 box (8G of RAM), RELENG8 from Feb1st.
>>
>> siis0@pci0:5:0:0:       class=0x010400 card=0x71241095 chip=0x31241095 rev=0x02 hdr=0x00
>>      vendor     = 'Silicon Image Inc (Was: CMD Technology Inc)'
>>      device     = 'PCI-X to Serial ATA Controller (SiI 3124)'
>>      class      = mass storage
>>      subclass   = RAID
>>      bar   [10] = type Memory, range 64, base 0xb4408000, size 128, enabled
>>      bar   [18] = type Memory, range 64, base 0xb4400000, size 32768, enabled
>>      bar   [20] = type I/O Port, range 32, base 0x3000, size 16, enabled
>>      cap 01[64] = powerspec 2  supports D0 D1 D2 D3  current D0
>>      cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions
>>      cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message
>>
>> siis0:<SiI3124 SATA controller>  port 0x3000-0x300f mem 0xb4408000-0xb440807f,0xb4400000-0xb4407fff irq 19 at device 0.0 on pci5
>> siis0: [ITHREAD]
>> siisch0:<SIIS channel>  at channel 0 on siis0
>> siisch0: [ITHREAD]
>> siisch1:<SIIS channel>  at channel 1 on siis0
>> siisch1: [ITHREAD]
>> siisch2:<SIIS channel>  at channel 2 on siis0
>> siisch2: [ITHREAD]
>> siisch3:<SIIS channel>  at channel 3 on siis0
>> siisch3: [ITHREAD]
>>
>> # camcontrol devlist
>> <WDC WD2001FASS-00U0B0 01.00101>    at scbus0 target 0 lun 0 (pass0,ada0)
>> <WDC WD2001FASS-00U0B0 01.00101>    at scbus0 target 1 lun 0 (pass1,ada1)
>> <WDC WD2001FASS-00U0B0 01.00101>    at scbus0 target 2 lun 0 (pass2,ada2)
>> <WDC WD2001FASS-00U0B0 01.00101>    at scbus0 target 3 lun 0 (pass3,ada3)
>> <Port Multiplier 47261095 1f06>     at scbus0 target 15 lun 0 (pass4,pmp1)
>> <WDC WD2002FAEX-007BA0 05.01D05>    at scbus1 target 0 lun 0 (pass5,ada4)
>> <WDC WD2002FAEX-007BA0 05.01D05>    at scbus1 target 1 lun 0 (pass6,ada5)
>> <WDC WD2002FAEX-007BA0 05.01D05>    at scbus1 target 2 lun 0 (pass7,ada6)
>> <WDC WD2002FAEX-007BA0 05.01D05>    at scbus1 target 3 lun 0 (pass8,ada7)
>> <WDC WD2002FAEX-007BA0 05.01D05>    at scbus1 target 4 lun 0 (pass9,ada8)
>> <Port Multiplier 37261095 1706>     at scbus1 target 15 lun 0 (pass10,pmp0)
>> <Areca usrvar R001>                 at scbus4 target 0 lun 0 (pass11,da0)
>> <Areca backup1 R001>                at scbus4 target 0 lun 1 (pass12,da1)
>> <Areca RAID controller R001>        at scbus4 target 16 lun 0 (pass13)
>> <AMCC 9650SE-2LP DISK 4.10>         at scbus5 target 0 lun 0 (pass14,da2)
>> <ST31000333AS SD35>                 at scbus6 target 0 lun 0 (pass15,ada9)
>> <ST31000528AS CC35>                 at scbus7 target 0 lun 0 (pass16,ada10)
>> <ST31000340AS SD1A>                 at scbus8 target 0 lun 0 (pass17,ada11)
>> <WDC WD1002FAEX-00Z3A0 05.01D05>    at scbus11 target 0 lun 0 (pass18,ada12)
>>
>>
>> Ever since I added a new PM, I have been seeing a new error (READ LOG EXT) along with a the odd slot timeout error.
>>
>>
>> Feb  7 23:49:32 backup3 kernel: siisch1:  ... waiting for slots 47000000
>> Feb  7 23:49:32 backup3 kernel: siisch1: Timeout on slot 26
>> Feb  7 23:49:32 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
>> Feb  7 23:49:32 backup3 kernel: siisch1:  ... waiting for slots 43000000
>> Feb  7 23:49:34 backup3 kernel: siisch1: Timeout on slot 30
>> Feb  7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
>> Feb  7 23:49:34 backup3 kernel: siisch1:  ... waiting for slots 03000000
>> Feb  7 23:49:34 backup3 kernel: siisch1: Timeout on slot 25
>> Feb  7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
>> Feb  7 23:49:34 backup3 kernel: siisch1:  ... waiting for slots 01000000
>> Feb  7 23:49:34 backup3 kernel: siisch1: Timeout on slot 24
>> Feb  7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
>
> This indicates the controller on channel 1 (siisch1) is "stalled"
> waiting for underlying communication with the device attached to it.
>
>> Feb  7 23:57:59 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 00:13:36 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 00:21:53 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 00:22:16 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 00:39:13 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 01:24:25 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 01:33:52 backup3 last message repeated 2 times
>> Feb  8 01:43:45 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 01:50:31 backup3 last message repeated 2 times
>> Feb  8 01:55:20 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 02:26:26 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 02:27:24 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 03:16:28 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 03:36:20 backup3 kernel: siisch1: Error while READ LOG EXT
>> Feb  8 04:04:05 backup3 kernel: siisch1: Error while READ LOG EXT
>
> This indicates the underlying device was handed a READ LOG EXT ATA
> command (command 0x2f) and the device did not respond promptly
> (resulting in the timeout messages you see).

There are hours between timeouts and READ LOG EXT errors. they are not 
directly related, but may have the same reason.

>> smartctl doesnt show any issues on the drives other than one that has some historical errors from a while ago.  What are these errors and do I need to worry about them ? The "READ LOG EXT" ones are new.
>>
>> {snipping SMART stats}
>
> You're focused heavily on the READ LOG EXT command.  READ LOG EXT is
> intended for accessing the GP Log section of a drive.  EXT stands for
> "Extended".  "GP Log" means "General Purpose Log", and is where all
> sorts of logging information regarding drive performance is stored.
> It's usually stored within a reserved section of the platters, or in the
> HPA area.  It's not within a "standard" user-accessible LBA/sector
> region.  This is a completely separate log from that of SMART logs.

READ LOG EXT commands here used to fetch status of some failed NCQ 
commands. It is normal (the only) way to get detailed error status in 
that case. Error of the READ LOG EXT commands may mean that it is not 
regular media error, but may be problem with communication, firmware or 
something else.

> You can review the different types of "logs" on a device by reviewing
> the ATA8-ACS specification here.  See Annex A, section A.1, page 362:
>
> http://www.t13.org/documents/UploadedDocuments/docs2007/D1699r4a-ATA8-ACS.pdf
>
> This is almost certainly a lower level problem with the disk that cannot
> be addressed/solved via normal means.  Thus, my recommendation is to
> replace the disk.
>
> If you would rather not replace the disk, I can try to step you through
> looking at the GPLog sections of the disk to see if you can trigger the
> problem -- and I have a feeling you'll be able to, but I won't
> necessarily be able to tell you where the actual problem lies
> hardware-wise, nor will I be able to solve the problem.
>
> Regarding the repeated errors at semi-regular (but not entirely)
> intervals: are you using smartd?  Do you have a cronjob that issues
> smartctl -a or smartctl -x commands at intervals?  I imagine any of
> these could be tickling something lower level.
>
> Also, please upgrade your smartmontools to 5.42.  It does provide some
> further enhancements that are useful.

-- 
Alexander Motin