From owner-freebsd-fs@FreeBSD.ORG Thu Sep 16 10:58:40 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F087C1065672 for ; Thu, 16 Sep 2010 10:58:39 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 7009C8FC18 for ; Thu, 16 Sep 2010 10:58:38 +0000 (UTC) Received: by bwz15 with SMTP id 15so1822272bwz.13 for ; Thu, 16 Sep 2010 03:58:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type; bh=lw9P/9sS0E1J6VOSCTHSkbA1wHPZCXa0lJ4+Hpn22Fw=; b=n8zwZLx3Mk/dTu64KORhmSkUUeZCfQo7Df8h8hbstMeOrC/WGorCmyIea2y8c2uQnU ySoXMZm8yz1vyrQY7JjnCVQFkHD4SJS8yPUAkWob7CPHOh7QgEp9B85p6d1YIZHRJYGw mLOPB9eAyXHxSD1ePodbiowJhhJL1ZpEn/Jw4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type; b=o0fAmfFjB3TYkfTNZy89H/o06cFoPc4WtWJivK5zNebqznFSREZl6g4MXabqOGllhd 6zNZnOb52Stp4Rqt8aGKaG2lo/BwqNeo97zGrr3VgcgGX47/eE6Nw7vp4QiSNeW2SXQC pytGVK5etK2RJXAACi5qmEncYarGhL7r7CvrE= Received: by 10.223.108.212 with SMTP id g20mr1201257fap.47.1284634718088; Thu, 16 Sep 2010 03:58:38 -0700 (PDT) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id b11sm1071226faq.6.2010.09.16.03.58.35 (version=SSLv3 cipher=RC4-MD5); Thu, 16 Sep 2010 03:58:36 -0700 (PDT) Sender: Alexander Motin Message-ID: <4C91F845.4010100@FreeBSD.org> Date: Thu, 16 Sep 2010 13:58:13 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: a.smith@ukgrid.net References: <20100909140000.5744370gkyqv4eo0@webmail2.ukgrid.net> <20100909182318.11133lqu4q4u1mw4@webmail2.ukgrid.net> <4C89D6A8.1080107@icyb.net.ua> <20100910143900.20382xl5bl6oo9as@webmail2.ukgrid.net> <20100910141127.GA13056@icarus.home.lan> <20100910155510.11831w104qjpyc4g@webmail2.ukgrid.net> <20100910152544.GA14636@icarus.home.lan> <20100910173912.205969tzhjiovf8c@webmail2.ukgrid.net> <4C8A6B26.8050305@icyb.net.ua> <20100910184921.16956kbaskhrsmg4@webmail2.ukgrid.net> <4C8A7B20.7090408@FreeBSD.org> In-Reply-To: <4C8A7B20.7090408@FreeBSD.org> X-Enigmail-Version: 0.96.0 Content-Type: multipart/mixed; boundary="------------050703070900040004070005" Cc: freebsd-fs@freebsd.org, Andriy Gapon Subject: Re: ZFS related kernel panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 10:58:40 -0000 This is a multi-part message in MIME format. --------------050703070900040004070005 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Alexander Motin wrote: > It looks like during timeout handling (it is quite complicated process > when port multiplier is used) some request was completed twice. So > original problem is probably in hardware (try to check/replace cables, > multiplier, ...), that caused timeout, but the fact that drive was > unable to handle it is probably a siis(4) driver bug. Thanks to console access provided, I have found the reason of crash. Attached patch should fix it. Patched system successfully runs the stress test for 45 minutes now, comparing to crashing in few minutes without it. Also I've found that timeouts reported by the driver are not fatal. Affected commands are correctly completing as soon as after detecting time out driver freezes new incoming requests to resolve situation, and as result, idling the bus. ones. These timeouts I think caused by some congestion on SATA interface, that probably caused by port multiplier. This panic could be triggered only by such fake timeouts, not the real -- Alexander Motin --------------050703070900040004070005 Content-Type: text/plain; name="siis.c.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="siis.c.patch" --- siis.c.debug 2010-09-16 11:11:59.000000000 +0100 +++ siis.c 2010-09-16 11:12:31.000000000 +0100 @@ -1209,6 +1209,7 @@ siis_end_transaction(struct siis_slot *s device_t dev = slot->dev; struct siis_channel *ch = device_get_softc(dev); union ccb *ccb = slot->ccb; + int lastto; mtx_assert(&ch->mtx, MA_OWNED); bus_dmamap_sync(ch->dma.work_tag, ch->dma.work_map, @@ -1292,11 +1293,6 @@ siis_end_transaction(struct siis_slot *s ch->oslots &= ~(1 << slot->slot); ch->rslots &= ~(1 << slot->slot); ch->aslots &= ~(1 << slot->slot); - if (et != SIIS_ERR_TIMEOUT) { - if (ch->toslots == (1 << slot->slot)) - xpt_release_simq(ch->sim, TRUE); - ch->toslots &= ~(1 << slot->slot); - } slot->state = SIIS_SLOT_EMPTY; slot->ccb = NULL; /* Update channel stats. */ @@ -1305,6 +1301,13 @@ siis_end_transaction(struct siis_slot *s (ccb->ataio.cmd.flags & CAM_ATAIO_FPDMA)) { ch->numtslots[ccb->ccb_h.target_id]--; } + /* Cancel timeout state if request completed normally. */ + if (et != SIIS_ERR_TIMEOUT) { + lastto = (ch->toslots == (1 << slot->slot)); + ch->toslots &= ~(1 << slot->slot); + if (lastto) + xpt_release_simq(ch->sim, TRUE); + } /* If it was our READ LOG command - process it. */ if (ch->readlog) { siis_process_read_log(dev, ccb); --------------050703070900040004070005--