Date: Wed, 23 May 2012 12:54:36 -0300 From: "Nenhum_de_Nos" <matheus@eternamente.info> To: freebsd-stable@freebsd.org Subject: Re: siis_timeout with port multiplier on 9.0R Message-ID: <460e1bd626613f125b878f5be65a6b6e.squirrel@eternamente.info> In-Reply-To: <4FBCF2B6.1060200@sentex.net> References: <CBE05E47.2E390%mgamble@primustel.ca> <4FBCF2B6.1060200@sentex.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 23, 2012 11:22, Mike Tancsa wrote: > On 5/21/2012 9:04 PM, Matthew Gamble wrote: >> We have a box with 3 SiI3124 SATA controllers and 9 CFI-B53PM 5 Port Backplane port multipliers >> (the "backblaze storage pod"). Under intense IO (ZFS rebuild, presently) the system will lock >> up all IO for 3-4 minutes and the following entry appears in the dmesg: >> >> siisch11: Timeout on slot 30 >> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr >> 00000000 >> siisch11: ... waiting for slots 25000000 >> siisch11: Timeout on slot 26 >> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr >> 00000000 >> siisch11: ... waiting for slots 21000000 >> siisch11: Timeout on slot 29 >> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr >> 00000000 >> siisch11: ... waiting for slots 01000000 >> siisch11: Timeout on slot 24 >> siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr >> 00000000 >> >> The errors are on different siisch devices so its not likely to be a SATA cable issue unless >> multiple cables all went bad at the same time. On the advice of some other posts to the mailing >> list I've already tried locking the SATA rev to one with the following in /boot/loader.conf >> which didn't > > If they are on different siisch devices then yes, it does not sound like > a bad cable. However, I have had that issue with similar errors above > that were fixed by using new cables. If you are using 9.0R, I would > suggest upgrading to stable. There have been a few bug fixes / > improvements to the drivers as well as various parts of the disk > subsystem. I have RELENG8 right now and its quite stable for me on a > 25TB system which is for the most part similar to 9.x > > # zpool status > pool: zbackup1 > state: ONLINE > scan: scrub repaired 0 in 11h11m with 0 errors on Mon Jul 25 19:51:11 2011 > config: > > NAME STATE READ WRITE CKSUM > zbackup1 ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > ada14 ONLINE 0 0 0 > ada16 ONLINE 0 0 0 > ada13 ONLINE 0 0 0 > ada15 ONLINE 0 0 0 > raidz1-1 ONLINE 0 0 0 > ada0 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > raidz1-2 ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > ada5 ONLINE 0 0 0 > ada6 ONLINE 0 0 0 > ada7 ONLINE 0 0 0 > raidz1-3 ONLINE 0 0 0 > ada9 ONLINE 0 0 0 > ada10 ONLINE 0 0 0 > ada11 ONLINE 0 0 0 > ada12 ONLINE 0 0 0 > > errors: No known data errors > # zpool get all zbackup1 > NAME PROPERTY VALUE SOURCE > zbackup1 size 25.4T - > zbackup1 capacity 68% - > zbackup1 altroot - default > zbackup1 health ONLINE - > zbackup1 guid 917659042733882722 default > zbackup1 version 28 default > zbackup1 bootfs - default > zbackup1 delegation on default > zbackup1 autoreplace off default > zbackup1 cachefile - default > zbackup1 failmode wait default > zbackup1 listsnapshots on local > zbackup1 autoexpand off default > zbackup1 dedupditto 0 default > zbackup1 dedupratio 1.00x - > zbackup1 free 7.95T - > zbackup1 allocated 17.4T - > zbackup1 readonly off - > zbackup1 comment - default > > This is on an adonics adaptor. my adapter is this adonics as well, and my lucky is not the same. the host card is also sis3124 PCI ? I will upgrade to 9-STABLE and try. thanks, matheus > ---Mike >> >> hint.siisch.0.sata_rev=1 >> hint.siisch.1.sata_rev=1 >> hint.siisch.2.sata_rev=1 >> hint.siisch.3.sata_rev=1 >> hint.siisch.4.sata_rev=1 >> hint.siisch.5.sata_rev=1 >> hint.siisch.6.sata_rev=1 >> hint.siisch.7.sata_rev=1 >> hint.siisch.8.sata_rev=1 >> hint.siisch.9.sata_rev=1 >> hint.siisch.10.sata_rev=1 >> hint.siisch.11.sata_rev=1 >> >> From time to time this is also causing one of the attached drives to go offline: >> >> siisch0: siis_timeout is 00040000 ss 40000000 rs 40000000 es 00000000 sts 801f2000 serr 00000000 >> (ada0:siisch0:0:0:0): lost device >> (ada0:siisch0:0:0:0): removing device entry >> ada0 at siisch0 bus 0 scbus0 target 0 lun 0 >> ada0: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device >> ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) >> ada0: Command Queueing enabled >> ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C) >> ada0: Previously was known as ad4 >> siisch11: Timeout on slot 30 >> >> When the drive goes offline that causes the ZFS rebuild to restart, and so it's never finishing >> the rebuild of the array. Does anyone have any insight into what could be causing the timeouts >> and what we can do to resolve them? Right now my priority is to get the system a bit more >> stable so the current ZFS rebuild can complete – right now it's been doing the same rebuild >> for just over 6 days and the timeouts and drive drop offs are causing it to restart constantly. >> >> >> >> >> >> ________________________________ >> >> This electronic message contains information from Primus Telecommunications Canada Inc. >> ("PRIMUS") , which may be legally privileged and confidential. The information is intended to >> be for the use of the individual(s) or entity named above. If you are not the intended >> recipient, be aware that any disclosure, copying, distribution or use of the contents of this >> information is prohibited. If you have received this electronic message in error, please notify >> us by telephone or e-mail (to the number or address above) immediately. Any views, opinions or >> advice expressed in this electronic message are not necessarily the views, opinions or advice >> of PRIMUS. It is the responsibility of the recipient to ensure that any attachments are virus >> free and PRIMUS bears no responsibility for any loss or damage arising in any way from the use >> thereof.The term "PRIMUS" includes its affiliates. >> >> ________________________________ >> Pour la version en français de ce message, veuillez voir >> http://www.primustel.ca/fr/legal/cs.htm >> >> >> >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, mike@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/ > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > -- We will call you Cygnus, The God of balance you shall be A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? http://en.wikipedia.org/wiki/Posting_style
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?460e1bd626613f125b878f5be65a6b6e.squirrel>