Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 May 2012 12:54:36 -0300
From:      "Nenhum_de_Nos" <matheus@eternamente.info>
To:        freebsd-stable@freebsd.org
Subject:   Re: siis_timeout with port multiplier on 9.0R
Message-ID:  <460e1bd626613f125b878f5be65a6b6e.squirrel@eternamente.info>
In-Reply-To: <4FBCF2B6.1060200@sentex.net>
References:  <CBE05E47.2E390%mgamble@primustel.ca> <4FBCF2B6.1060200@sentex.net>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, May 23, 2012 11:22, Mike Tancsa wrote:
> On 5/21/2012 9:04 PM, Matthew Gamble wrote:
>> We have a box with 3 SiI3124 SATA controllers and 9 CFI-B53PM 5 Port Backplane port multipliers
>> (the "backblaze storage pod").  Under intense IO (ZFS rebuild, presently) the system will lock
>> up all IO for 3-4 minutes and the following entry appears in the dmesg:
>>
>> siisch11: Timeout on slot 30
>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr
>> 00000000
>> siisch11:  ... waiting for slots 25000000
>> siisch11: Timeout on slot 26
>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr
>> 00000000
>> siisch11:  ... waiting for slots 21000000
>> siisch11: Timeout on slot 29
>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr
>> 00000000
>> siisch11:  ... waiting for slots 01000000
>> siisch11: Timeout on slot 24
>> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr
>> 00000000
>>
>> The errors are on different siisch devices so its not likely to be a SATA cable issue unless
>> multiple cables all went bad at the same time.  On the advice of some other posts to the mailing
>> list I've already tried locking the SATA rev to one with the following in /boot/loader.conf
>> which didn't
>
> If they are on different siisch devices then yes, it does not sound like
> a bad cable. However, I have had that issue with similar errors above
> that were fixed by using new cables.  If you are using 9.0R, I would
> suggest upgrading to stable. There have been a few bug fixes /
> improvements to the drivers as well as various parts of the disk
> subsystem. I have RELENG8 right now and its quite stable for me on a
> 25TB system which is for the most part similar to 9.x
>
> # zpool status
>   pool: zbackup1
>  state: ONLINE
>   scan: scrub repaired 0 in 11h11m with 0 errors on Mon Jul 25 19:51:11 2011
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         zbackup1    ONLINE       0     0     0
>           raidz1-0  ONLINE       0     0     0
>             ada14   ONLINE       0     0     0
>             ada16   ONLINE       0     0     0
>             ada13   ONLINE       0     0     0
>             ada15   ONLINE       0     0     0
>           raidz1-1  ONLINE       0     0     0
>             ada0    ONLINE       0     0     0
>             ada1    ONLINE       0     0     0
>             ada2    ONLINE       0     0     0
>             ada3    ONLINE       0     0     0
>           raidz1-2  ONLINE       0     0     0
>             ada4    ONLINE       0     0     0
>             ada5    ONLINE       0     0     0
>             ada6    ONLINE       0     0     0
>             ada7    ONLINE       0     0     0
>           raidz1-3  ONLINE       0     0     0
>             ada9    ONLINE       0     0     0
>             ada10   ONLINE       0     0     0
>             ada11   ONLINE       0     0     0
>             ada12   ONLINE       0     0     0
>
> errors: No known data errors
> # zpool get all zbackup1
> NAME      PROPERTY       VALUE       SOURCE
> zbackup1  size           25.4T       -
> zbackup1  capacity       68%         -
> zbackup1  altroot        -           default
> zbackup1  health         ONLINE      -
> zbackup1  guid           917659042733882722  default
> zbackup1  version        28          default
> zbackup1  bootfs         -           default
> zbackup1  delegation     on          default
> zbackup1  autoreplace    off         default
> zbackup1  cachefile      -           default
> zbackup1  failmode       wait        default
> zbackup1  listsnapshots  on          local
> zbackup1  autoexpand     off         default
> zbackup1  dedupditto     0           default
> zbackup1  dedupratio     1.00x       -
> zbackup1  free           7.95T       -
> zbackup1  allocated      17.4T       -
> zbackup1  readonly       off         -
> zbackup1  comment        -           default
>
> This is on an adonics adaptor.

my adapter is this adonics as well, and my lucky is not the same. the host card is also sis3124 PCI ?

I will upgrade to 9-STABLE and try.

thanks,

matheus

> 	---Mike
>>
>> hint.siisch.0.sata_rev=1
>> hint.siisch.1.sata_rev=1
>> hint.siisch.2.sata_rev=1
>> hint.siisch.3.sata_rev=1
>> hint.siisch.4.sata_rev=1
>> hint.siisch.5.sata_rev=1
>> hint.siisch.6.sata_rev=1
>> hint.siisch.7.sata_rev=1
>> hint.siisch.8.sata_rev=1
>> hint.siisch.9.sata_rev=1
>> hint.siisch.10.sata_rev=1
>> hint.siisch.11.sata_rev=1
>>
>> From time to time this is also causing one of the attached drives to go offline:
>>
>> siisch0: siis_timeout is 00040000 ss 40000000 rs 40000000 es 00000000 sts 801f2000 serr 00000000
>> (ada0:siisch0:0:0:0): lost device
>> (ada0:siisch0:0:0:0): removing device entry
>> ada0 at siisch0 bus 0 scbus0 target 0 lun 0
>> ada0: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device
>> ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
>> ada0: Command Queueing enabled
>> ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
>> ada0: Previously was known as ad4
>> siisch11: Timeout on slot 30
>>
>> When the drive goes offline that causes the ZFS rebuild to restart, and so it's never finishing
>> the rebuild of the array.  Does anyone have any insight into what could be causing the timeouts
>> and what we can do to resolve them?  Right now my priority is to get the system a bit more
>> stable so the current ZFS rebuild can complete – right now it's been doing the same rebuild
>> for just over 6 days and the timeouts and drive drop offs are causing it to restart constantly.
>>
>>
>>
>>
>>
>> ________________________________
>>
>>  This electronic message contains information from Primus Telecommunications Canada Inc.
>> ("PRIMUS") , which may be legally privileged and confidential. The information is intended to
>> be for the use of the individual(s) or entity named above. If you are not the intended
>> recipient, be aware that any disclosure, copying, distribution or use of the contents of this
>> information is prohibited. If you have received this electronic message in error, please notify
>> us by telephone or e-mail (to the number or address above) immediately. Any views, opinions or
>> advice expressed in this electronic message are not necessarily the views, opinions or advice
>> of PRIMUS. It is the responsibility of the recipient to ensure that any attachments are virus
>> free and PRIMUS bears no responsibility for any loss or damage arising in any way from the use
>> thereof.The term "PRIMUS" includes its affiliates.
>>
>> ________________________________
>>  Pour la version en français de ce message, veuillez voir
>> http://www.primustel.ca/fr/legal/cs.htm
>>
>>
>>
>> _______________________________________________
>> freebsd-stable@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>
>
> --
> -------------------
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, mike@sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada   http://www.tancsa.com/
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>


-- 
We will call you Cygnus,
The God of balance you shall be

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

http://en.wikipedia.org/wiki/Posting_style



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?460e1bd626613f125b878f5be65a6b6e.squirrel>