Date: Thu, 16 Sep 2010 13:58:13 +0300 From: Alexander Motin <mav@FreeBSD.org> To: a.smith@ukgrid.net Cc: freebsd-fs@freebsd.org, Andriy Gapon <avg@icyb.net.ua> Subject: Re: ZFS related kernel panic Message-ID: <4C91F845.4010100@FreeBSD.org> In-Reply-To: <4C8A7B20.7090408@FreeBSD.org> References: <20100909140000.5744370gkyqv4eo0@webmail2.ukgrid.net> <20100909182318.11133lqu4q4u1mw4@webmail2.ukgrid.net> <4C89D6A8.1080107@icyb.net.ua> <20100910143900.20382xl5bl6oo9as@webmail2.ukgrid.net> <20100910141127.GA13056@icarus.home.lan> <20100910155510.11831w104qjpyc4g@webmail2.ukgrid.net> <20100910152544.GA14636@icarus.home.lan> <20100910173912.205969tzhjiovf8c@webmail2.ukgrid.net> <4C8A6B26.8050305@icyb.net.ua> <20100910184921.16956kbaskhrsmg4@webmail2.ukgrid.net> <4C8A7B20.7090408@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------050703070900040004070005 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Alexander Motin wrote: > It looks like during timeout handling (it is quite complicated process > when port multiplier is used) some request was completed twice. So > original problem is probably in hardware (try to check/replace cables, > multiplier, ...), that caused timeout, but the fact that drive was > unable to handle it is probably a siis(4) driver bug. Thanks to console access provided, I have found the reason of crash. Attached patch should fix it. Patched system successfully runs the stress test for 45 minutes now, comparing to crashing in few minutes without it. Also I've found that timeouts reported by the driver are not fatal. Affected commands are correctly completing as soon as after detecting time out driver freezes new incoming requests to resolve situation, and as result, idling the bus. ones. These timeouts I think caused by some congestion on SATA interface, that probably caused by port multiplier. This panic could be triggered only by such fake timeouts, not the real -- Alexander Motin --------------050703070900040004070005 Content-Type: text/plain; name="siis.c.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="siis.c.patch" --- siis.c.debug 2010-09-16 11:11:59.000000000 +0100 +++ siis.c 2010-09-16 11:12:31.000000000 +0100 @@ -1209,6 +1209,7 @@ siis_end_transaction(struct siis_slot *s device_t dev = slot->dev; struct siis_channel *ch = device_get_softc(dev); union ccb *ccb = slot->ccb; + int lastto; mtx_assert(&ch->mtx, MA_OWNED); bus_dmamap_sync(ch->dma.work_tag, ch->dma.work_map, @@ -1292,11 +1293,6 @@ siis_end_transaction(struct siis_slot *s ch->oslots &= ~(1 << slot->slot); ch->rslots &= ~(1 << slot->slot); ch->aslots &= ~(1 << slot->slot); - if (et != SIIS_ERR_TIMEOUT) { - if (ch->toslots == (1 << slot->slot)) - xpt_release_simq(ch->sim, TRUE); - ch->toslots &= ~(1 << slot->slot); - } slot->state = SIIS_SLOT_EMPTY; slot->ccb = NULL; /* Update channel stats. */ @@ -1305,6 +1301,13 @@ siis_end_transaction(struct siis_slot *s (ccb->ataio.cmd.flags & CAM_ATAIO_FPDMA)) { ch->numtslots[ccb->ccb_h.target_id]--; } + /* Cancel timeout state if request completed normally. */ + if (et != SIIS_ERR_TIMEOUT) { + lastto = (ch->toslots == (1 << slot->slot)); + ch->toslots &= ~(1 << slot->slot); + if (lastto) + xpt_release_simq(ch->sim, TRUE); + } /* If it was our READ LOG command - process it. */ if (ch->readlog) { siis_process_read_log(dev, ccb); --------------050703070900040004070005--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C91F845.4010100>