From owner-freebsd-stable@FreeBSD.ORG Wed May 23 14:22:56 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A52111065687 for ; Wed, 23 May 2012 14:22:56 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1-6.sentex.ca [IPv6:2607:f3e0:0:1::12]) by mx1.freebsd.org (Postfix) with ESMTP id 548468FC19 for ; Wed, 23 May 2012 14:22:56 +0000 (UTC) Received: from [192.168.43.26] (pyroxene.sentex.ca [199.212.134.18]) by smarthost1.sentex.ca (8.14.5/8.14.4) with ESMTP id q4NEMsmY073480; Wed, 23 May 2012 10:22:54 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <4FBCF2B6.1060200@sentex.net> Date: Wed, 23 May 2012 10:22:46 -0400 From: Mike Tancsa Organization: Sentex Communications User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: Matthew Gamble References: In-Reply-To: X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.72 on 64.7.153.18 Cc: "freebsd-stable@freebsd.org" Subject: Re: siis_timeout with port multiplier on 9.0R X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 May 2012 14:22:56 -0000 On 5/21/2012 9:04 PM, Matthew Gamble wrote: > We have a box with 3 SiI3124 SATA controllers and 9 CFI-B53PM 5 Port Backplane port multipliers (the "backblaze storage pod"). Under intense IO (ZFS rebuild, presently) the system will lock up all IO for 3-4 minutes and the following entry appears in the dmesg: > > siisch11: Timeout on slot 30 > siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr 00000000 > siisch11: ... waiting for slots 25000000 > siisch11: Timeout on slot 26 > siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr 00000000 > siisch11: ... waiting for slots 21000000 > siisch11: Timeout on slot 29 > siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr 00000000 > siisch11: ... waiting for slots 01000000 > siisch11: Timeout on slot 24 > siisch11: siis_timeout is 00040000 ss 65000000 rs 65000000 es 00000000 sts 80192000 serr 00000000 > > The errors are on different siisch devices so its not likely to be a SATA cable issue unless multiple cables all went bad at the same time. On the advice of some other posts to the mailing list I've already tried locking the SATA rev to one with the following in /boot/loader.conf which didn't If they are on different siisch devices then yes, it does not sound like a bad cable. However, I have had that issue with similar errors above that were fixed by using new cables. If you are using 9.0R, I would suggest upgrading to stable. There have been a few bug fixes / improvements to the drivers as well as various parts of the disk subsystem. I have RELENG8 right now and its quite stable for me on a 25TB system which is for the most part similar to 9.x # zpool status pool: zbackup1 state: ONLINE scan: scrub repaired 0 in 11h11m with 0 errors on Mon Jul 25 19:51:11 2011 config: NAME STATE READ WRITE CKSUM zbackup1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada14 ONLINE 0 0 0 ada16 ONLINE 0 0 0 ada13 ONLINE 0 0 0 ada15 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 raidz1-2 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 raidz1-3 ONLINE 0 0 0 ada9 ONLINE 0 0 0 ada10 ONLINE 0 0 0 ada11 ONLINE 0 0 0 ada12 ONLINE 0 0 0 errors: No known data errors # zpool get all zbackup1 NAME PROPERTY VALUE SOURCE zbackup1 size 25.4T - zbackup1 capacity 68% - zbackup1 altroot - default zbackup1 health ONLINE - zbackup1 guid 917659042733882722 default zbackup1 version 28 default zbackup1 bootfs - default zbackup1 delegation on default zbackup1 autoreplace off default zbackup1 cachefile - default zbackup1 failmode wait default zbackup1 listsnapshots on local zbackup1 autoexpand off default zbackup1 dedupditto 0 default zbackup1 dedupratio 1.00x - zbackup1 free 7.95T - zbackup1 allocated 17.4T - zbackup1 readonly off - zbackup1 comment - default This is on an adonics adaptor. ---Mike > > hint.siisch.0.sata_rev=1 > hint.siisch.1.sata_rev=1 > hint.siisch.2.sata_rev=1 > hint.siisch.3.sata_rev=1 > hint.siisch.4.sata_rev=1 > hint.siisch.5.sata_rev=1 > hint.siisch.6.sata_rev=1 > hint.siisch.7.sata_rev=1 > hint.siisch.8.sata_rev=1 > hint.siisch.9.sata_rev=1 > hint.siisch.10.sata_rev=1 > hint.siisch.11.sata_rev=1 > > From time to time this is also causing one of the attached drives to go offline: > > siisch0: siis_timeout is 00040000 ss 40000000 rs 40000000 es 00000000 sts 801f2000 serr 00000000 > (ada0:siisch0:0:0:0): lost device > (ada0:siisch0:0:0:0): removing device entry > ada0 at siisch0 bus 0 scbus0 target 0 lun 0 > ada0: ATA-8 SATA 3.x device > ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C) > ada0: Previously was known as ad4 > siisch11: Timeout on slot 30 > > When the drive goes offline that causes the ZFS rebuild to restart, and so it's never finishing the rebuild of the array. Does anyone have any insight into what could be causing the timeouts and what we can do to resolve them? Right now my priority is to get the system a bit more stable so the current ZFS rebuild can complete – right now it's been doing the same rebuild for just over 6 days and the timeouts and drive drop offs are causing it to restart constantly. > > > > > > ________________________________ > > This electronic message contains information from Primus Telecommunications Canada Inc. ("PRIMUS") , which may be legally privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by telephone or e-mail (to the number or address above) immediately. Any views, opinions or advice expressed in this electronic message are not necessarily the views, opinions or advice of PRIMUS. It is the responsibility of the recipient to ensure that any attachments are virus free and PRIMUS bears no responsibility for any loss or damage arising in any way from the use thereof.The term "PRIMUS" includes its affiliates. > > ________________________________ > Pour la version en français de ce message, veuillez voir > http://www.primustel.ca/fr/legal/cs.htm > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/