Date: Thu, 29 Apr 2010 16:55:12 +0300 From: Alexander Motin <mav@FreeBSD.org> To: Pete French <petefrench@ticketswitch.com> Cc: freebsd-scsi@freebsd.org, scottl@FreeBSD.org, freebsd-stable@freebsd.org Subject: Re: MFC of "Large set of CAM improvements" breaks I/O to Adaptec 29160 SCSI controller Message-ID: <4BD98FC0.2030206@FreeBSD.org> In-Reply-To: <4BD896AC.4080509@FreeBSD.org> References: <E1O7AGM-0005ux-8E@dilbert.ticketswitch.com> <4BD896AC.4080509@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------080401080207070003020809 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Alexander Motin wrote: > Pete French wrote: >>> I have some 29160N locally and I'll try to reproduce this. >> I would suggest you try gmirror across two drives - that is how >> both myself and the original poster first noticed the issue. > > Thanks. First step successful - I can steadily reproduce problem on > CURRENT. raidtest with 200 I/O streams over gmirror of two disks on same > channel triggers issue in seconds. Any I/O on channel dying after both > disks report "Queue full" error same time. The rest of system works > fine. If I preliminarily manually adjust queue depth of one disk - > everything works fine. I'll investigate it tomorrow. Seems like I've found the reason. Attached patch fixes problem for me. This call was removed by mistake in specified commit. It is not needed during normal operation, only when device queue shrinking. And even in that case problem often wasn't not triggered if there were more requests and controller request allocation queue wasn't not exhausted at the moment. That's why problem wasn't detected and why gmirror increased it's chances. -- Alexander Motin --------------080401080207070003020809 Content-Type: text/plain; name="cam.sched_fix.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="cam.sched_fix.patch" --- cam_xpt.c.prev 2010-04-28 08:15:40.000000000 +0300 +++ cam_xpt.c 2010-04-29 16:01:23.000000000 +0300 @@ -4903,6 +4903,10 @@ camisr_runqueue(void *V_queue) if ((dev->flags & CAM_DEV_TAG_AFTER_COUNT) != 0 && (--dev->tag_delay_count == 0)) xpt_start_tags(ccb_h->path); + if (!device_is_send_queued(dev)) { + runq = xpt_schedule_dev_sendq(ccb_h->path->bus, + dev); + } } if (ccb_h->status & CAM_RELEASE_SIMQ) { --------------080401080207070003020809--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BD98FC0.2030206>