Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 09 Feb 2012 00:46:54 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: siisch1: Error while READ LOG EXT
Message-ID:  <4F32FB5E.7050102@FreeBSD.org>
In-Reply-To: <20120208223819.GA27488@icarus.home.lan>
References:  <4F32E289.4080806@sentex.net> <mailpost.1328736521.3202974.81071.mailing.freebsd.stable@FreeBSD.cs.nctu.edu.tw> <4F32F5B0.2060203@FreeBSD.org> <20120208223819.GA27488@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 09.02.2012 00:38, Jeremy Chadwick wrote:
> On Thu, Feb 09, 2012 at 12:22:40AM +0200, Alexander Motin wrote:
>> On 08.02.2012 23:27, Jeremy Chadwick wrote:
>>> On Wed, Feb 08, 2012 at 04:00:57PM -0500, Mike Tancsa wrote:
>>>> I have a 4 port eSata PCIe card with 3 external port multipliers attached on an AMD64 box (8G of RAM), RELENG8 from Feb1st.
>>>>
>>>> siis0@pci0:5:0:0:       class=0x010400 card=0x71241095 chip=0x31241095 rev=0x02 hdr=0x00
>>>>      vendor     = 'Silicon Image Inc (Was: CMD Technology Inc)'
>>>>      device     = 'PCI-X to Serial ATA Controller (SiI 3124)'
>>>>      class      = mass storage
>>>>      subclass   = RAID
>>>>      bar   [10] = type Memory, range 64, base 0xb4408000, size 128, enabled
>>>>      bar   [18] = type Memory, range 64, base 0xb4400000, size 32768, enabled
>>>>      bar   [20] = type I/O Port, range 32, base 0x3000, size 16, enabled
>>>>      cap 01[64] = powerspec 2  supports D0 D1 D2 D3  current D0
>>>>      cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions
>>>>      cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message
>>>>
>>>> siis0:<SiI3124 SATA controller>   port 0x3000-0x300f mem 0xb4408000-0xb440807f,0xb4400000-0xb4407fff irq 19 at device 0.0 on pci5
>>>> siis0: [ITHREAD]
>>>> siisch0:<SIIS channel>   at channel 0 on siis0
>>>> siisch0: [ITHREAD]
>>>> siisch1:<SIIS channel>   at channel 1 on siis0
>>>> siisch1: [ITHREAD]
>>>> siisch2:<SIIS channel>   at channel 2 on siis0
>>>> siisch2: [ITHREAD]
>>>> siisch3:<SIIS channel>   at channel 3 on siis0
>>>> siisch3: [ITHREAD]
>>>>
>>>> # camcontrol devlist
>>>> <WDC WD2001FASS-00U0B0 01.00101>     at scbus0 target 0 lun 0 (pass0,ada0)
>>>> <WDC WD2001FASS-00U0B0 01.00101>     at scbus0 target 1 lun 0 (pass1,ada1)
>>>> <WDC WD2001FASS-00U0B0 01.00101>     at scbus0 target 2 lun 0 (pass2,ada2)
>>>> <WDC WD2001FASS-00U0B0 01.00101>     at scbus0 target 3 lun 0 (pass3,ada3)
>>>> <Port Multiplier 47261095 1f06>      at scbus0 target 15 lun 0 (pass4,pmp1)
>>>> <WDC WD2002FAEX-007BA0 05.01D05>     at scbus1 target 0 lun 0 (pass5,ada4)
>>>> <WDC WD2002FAEX-007BA0 05.01D05>     at scbus1 target 1 lun 0 (pass6,ada5)
>>>> <WDC WD2002FAEX-007BA0 05.01D05>     at scbus1 target 2 lun 0 (pass7,ada6)
>>>> <WDC WD2002FAEX-007BA0 05.01D05>     at scbus1 target 3 lun 0 (pass8,ada7)
>>>> <WDC WD2002FAEX-007BA0 05.01D05>     at scbus1 target 4 lun 0 (pass9,ada8)
>>>> <Port Multiplier 37261095 1706>      at scbus1 target 15 lun 0 (pass10,pmp0)
>>>> <Areca usrvar R001>                  at scbus4 target 0 lun 0 (pass11,da0)
>>>> <Areca backup1 R001>                 at scbus4 target 0 lun 1 (pass12,da1)
>>>> <Areca RAID controller R001>         at scbus4 target 16 lun 0 (pass13)
>>>> <AMCC 9650SE-2LP DISK 4.10>          at scbus5 target 0 lun 0 (pass14,da2)
>>>> <ST31000333AS SD35>                  at scbus6 target 0 lun 0 (pass15,ada9)
>>>> <ST31000528AS CC35>                  at scbus7 target 0 lun 0 (pass16,ada10)
>>>> <ST31000340AS SD1A>                  at scbus8 target 0 lun 0 (pass17,ada11)
>>>> <WDC WD1002FAEX-00Z3A0 05.01D05>     at scbus11 target 0 lun 0 (pass18,ada12)
>>>>
>>>>
>>>> Ever since I added a new PM, I have been seeing a new error (READ LOG EXT) along with a the odd slot timeout error.
>>>>
>>>>
>>>> Feb  7 23:49:32 backup3 kernel: siisch1:  ... waiting for slots 47000000
>>>> Feb  7 23:49:32 backup3 kernel: siisch1: Timeout on slot 26
>>>> Feb  7 23:49:32 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
>>>> Feb  7 23:49:32 backup3 kernel: siisch1:  ... waiting for slots 43000000
>>>> Feb  7 23:49:34 backup3 kernel: siisch1: Timeout on slot 30
>>>> Feb  7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
>>>> Feb  7 23:49:34 backup3 kernel: siisch1:  ... waiting for slots 03000000
>>>> Feb  7 23:49:34 backup3 kernel: siisch1: Timeout on slot 25
>>>> Feb  7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
>>>> Feb  7 23:49:34 backup3 kernel: siisch1:  ... waiting for slots 01000000
>>>> Feb  7 23:49:34 backup3 kernel: siisch1: Timeout on slot 24
>>>> Feb  7 23:49:34 backup3 kernel: siisch1: siis_timeout is 07040000 ss 7f17e8b9 rs 7f17e8b9 es 00000000 sts 801d2000 serr 00680000
>>>
>>> This indicates the controller on channel 1 (siisch1) is "stalled"
>>> waiting for underlying communication with the device attached to it.
>>>
>>>> Feb  7 23:57:59 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 00:13:36 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 00:21:53 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 00:22:16 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 00:39:13 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 01:24:25 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 01:33:52 backup3 last message repeated 2 times
>>>> Feb  8 01:43:45 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 01:50:31 backup3 last message repeated 2 times
>>>> Feb  8 01:55:20 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 02:26:26 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 02:27:24 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 03:16:28 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 03:36:20 backup3 kernel: siisch1: Error while READ LOG EXT
>>>> Feb  8 04:04:05 backup3 kernel: siisch1: Error while READ LOG EXT
>>>
>>> This indicates the underlying device was handed a READ LOG EXT ATA
>>> command (command 0x2f) and the device did not respond promptly
>>> (resulting in the timeout messages you see).
>>
>> There are hours between timeouts and READ LOG EXT errors. they are
>> not directly related, but may have the same reason.
>>
>>>> smartctl doesnt show any issues on the drives other than one that has some historical errors from a while ago.  What are these errors and do I need to worry about them ? The "READ LOG EXT" ones are new.
>>>>
>>>> {snipping SMART stats}
>>>
>>> You're focused heavily on the READ LOG EXT command.  READ LOG EXT is
>>> intended for accessing the GP Log section of a drive.  EXT stands for
>>> "Extended".  "GP Log" means "General Purpose Log", and is where all
>>> sorts of logging information regarding drive performance is stored.
>>> It's usually stored within a reserved section of the platters, or in the
>>> HPA area.  It's not within a "standard" user-accessible LBA/sector
>>> region.  This is a completely separate log from that of SMART logs.
>>
>> READ LOG EXT commands here used to fetch status of some failed NCQ
>> commands. It is normal (the only) way to get detailed error status
>> in that case. Error of the READ LOG EXT commands may mean that it is
>> not regular media error, but may be problem with communication,
>> firmware or something else.
>
> As usual -- thanks for your fantastic insights, Alexander.
>
> I wasn't aware that to get NCQ error conditions you had to read from the
> GPLog region, I was under the impression this was somehow returned via
> extended SATA status codes (since I with NCQ the internal ordering CDBs
> changes).  Interesting.  NCQ is still something (on an ATA and OS level,
> for lack of better terminology) I am not fully familiar with, though
> functionally I understand what it provides/gains.  Where in the ata-cam
> code is this done?  I'd like to look at it for educational reasons.

READ LOG EXT for NCQ, same as REQUEST SENSE for ATAPI sent by every 
specific controller driver. In this case by siis_issue_recovery() 
function in dev/siis/siis.c. In case of proper READ LOG EXT completion, 
fetched status returned to CAM together with original command.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F32FB5E.7050102>