Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Feb 2012 07:22:40 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Mike Tancsa <mike@sentex.net>
Cc:        Alexander Motin <mav@FreeBSD.org>, freebsd-stable@FreeBSD.org
Subject:   Re: siisch1: Error while READ LOG EXT
Message-ID:  <20120209152240.GA95470@icarus.home.lan>
In-Reply-To: <4F33DB75.1080202@sentex.net>
References:  <4F32E289.4080806@sentex.net> <mailpost.1328736521.3202974.81071.mailing.freebsd.stable@FreeBSD.cs.nctu.edu.tw> <4F32F5B0.2060203@FreeBSD.org> <20120208223819.GA27488@icarus.home.lan> <4F32FB5E.7050102@FreeBSD.org> <4F33DB75.1080202@sentex.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Feb 09, 2012 at 09:43:01AM -0500, Mike Tancsa wrote:
> On 2/8/2012 5:46 PM, Alexander Motin wrote:
> > 
> > READ LOG EXT for NCQ, same as REQUEST SENSE for ATAPI sent by every
> > specific controller driver. In this case by siis_issue_recovery()
> > function in dev/siis/siis.c. In case of proper READ LOG EXT completion,
> > fetched status returned to CAM together with original command.
> 
> Hi,
> 	Is there a way to find out which drive is causing these errors ?
> Looking at the logs on the various drives, they all seem to have the odd
> non zero value.  I suspect it might be a Segate Disk as smartctl flags
> it as having bad firmware issues
> 
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate Barracuda 7200.11
> Device Model:     ST31000333AS
> Serial Number:    9TE14SRV
> LU WWN Device Id: 5 000c50 010a39664
> Firmware Version: SD35
> User Capacity:    1,000,204,886,016 bytes [1.00 TB]
> Sector Size:      512 bytes logical/physical
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 4
> Local Time is:    Thu Feb  9 09:40:56 2012 EST
> 
> ==> WARNING: There are known problems with these drives,
> see the following Seagate web pages:
> http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931
> http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207951
> http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207957

The URLs listed are for firmware-level problems with this model of
Seagate drive.  This is a very famous firmware issue and got a lot of
media attention.  The bugs with that firmware, however, would not appear
as what you are seeing.

You stated in your original mail that you "added a port multiplier" then
started getting these errors.  You then provided SMART output of
/dev/ada9, so I made the assumption you had managed to figure out what
device was causing the problem.

I have to assume that devices connected on a port multiplier show up on
a separate scbusX number.  This is from your original mail:

> # camcontrol devlist
> <WDC WD2001FASS-00U0B0 01.00101>   at scbus0 target 0 lun 0 (pass0,ada0)
> <WDC WD2001FASS-00U0B0 01.00101>   at scbus0 target 1 lun 0 (pass1,ada1)
> <WDC WD2001FASS-00U0B0 01.00101>   at scbus0 target 2 lun 0 (pass2,ada2)
> <WDC WD2001FASS-00U0B0 01.00101>   at scbus0 target 3 lun 0 (pass3,ada3)
> <Port Multiplier 47261095 1f06>    at scbus0 target 15 lun 0 (pass4,pmp1)
> <WDC WD2002FAEX-007BA0 05.01D05>   at scbus1 target 0 lun 0 (pass5,ada4)
> <WDC WD2002FAEX-007BA0 05.01D05>   at scbus1 target 1 lun 0 (pass6,ada5)
> <WDC WD2002FAEX-007BA0 05.01D05>   at scbus1 target 2 lun 0 (pass7,ada6)
> <WDC WD2002FAEX-007BA0 05.01D05>   at scbus1 target 3 lun 0 (pass8,ada7)
> <WDC WD2002FAEX-007BA0 05.01D05>   at scbus1 target 4 lun 0 (pass9,ada8)
> <Port Multiplier 37261095 1706>    at scbus1 target 15 lun 0 (pass10,pmp0)
> <Areca usrvar R001>                at scbus4 target 0 lun 0 (pass11,da0)
> <Areca backup1 R001>               at scbus4 target 0 lun 1 (pass12,da1)
> <Areca RAID controller R001>       at scbus4 target 16 lun 0 (pass13)
> <AMCC 9650SE-2LP DISK 4.10>        at scbus5 target 0 lun 0 (pass14,da2)
> <ST31000333AS SD35>                at scbus6 target 0 lun 0 (pass15,ada9)
> <ST31000528AS CC35>                at scbus7 target 0 lun 0 (pass16,ada10)
> <ST31000340AS SD1A>                at scbus8 target 0 lun 0 (pass17,ada11)
> <WDC WD1002FAEX-00Z3A0 05.01D05>   at scbus11 target 0 lun 0 (pass18,ada12)

Based on this, and assuming my understanding of how this setup works --
and please note I could be wrong, these port multiplier things I have no
familiarity with personally -- but it looks (to me) like this:

scbus0
  --> Associated with Port Multiplier device pmp1
      --> Disk ada0
      --> Disk ada1
      --> Disk ada2
      --> Disk ada3

scbus1
  --> Associated with Port Multiplier device pmp0
      --> Disk ada4
      --> Disk ada5
      --> Disk ada6
      --> Disk ada7
      --> Disk ada8

scbus4
  --> Appeaars to be a Areca controller of some kind, in RAID
      --> Disk da0, volume "usrvar" 
      --> Disk da1, volume "backup1"

scbus5
  --> Not sure what this thing is
      --> Disk or "thing" da2

scbus6
  --> Disk ada9

scbus7
  --> Disk ada10

scbus8
  --> Disk ada11

scbus11
  --> Disk ada12

So which Port Multiplier did you add?  The one at scbus0 or scbus1?

A full dmesg (not just a snippet) would probably be helpful here.  What
you provided in your first post was too terse, especially given how many
disks you have in this system.  :-)

I really see no problem with looking at all disks -- specifically disks
ada0 through ada3, and ada4 through ada8 -- to determine which one may
be having problems.  You're welcome to run "smartctl -a" on each one and
put them up on the web, preferably segregated by disk name (e.g.
ada0.txt, ada1.txt, etc.) and I can review them all.

-- 
| Jeremy Chadwick                                 jdc@parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120209152240.GA95470>