Date: Thu, 22 Aug 2013 08:21:07 -0600 From: "Kenneth D. Merry" <ken@freebsd.org> To: Dmitry Morozovsky <marck@rinet.ru> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r254615 - head/sys/dev/mps Message-ID: <20130822142107.GA49996@nargothrond.kdm.org> In-Reply-To: <alpine.BSF.2.00.1308221641010.10197@woozle.rinet.ru> References: <201308212130.r7LLUvO5008991@svn.freebsd.org> <alpine.BSF.2.00.1308221641010.10197@woozle.rinet.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Aug 22, 2013 at 16:42:41 +0400, Dmitry Morozovsky wrote: > Ken, > > On Wed, 21 Aug 2013, Kenneth D. Merry wrote: > > > Author: ken > > Date: Wed Aug 21 21:30:56 2013 > > New Revision: 254615 > > URL: http://svnweb.freebsd.org/changeset/base/254615 > > > > Log: > > Fix mps(4) driver breakage that came in in change 253550 that > > manifested itself in out of chain frame conditions. > > > > When the driver ran out of chain frames, the request in question > > would get completed early, and go through mpssas_scsiio_complete(). > > > > In mpssas_scsiio_complete(), the negation of the CAM status values > > (CAM_STATUS_MASK | CAM_SIM_QUEUED) was ORed in instead of being > > ANDed in. This resulted in a bogus CAM CCB status value. This > > didn't show up in the non-error case, because the status was reset > > to something valid (e.g. CAM_REQ_CMP) later on in the function. > > > > But in the error case, such as when the driver ran out of chain > > frames, the CAM_REQUEUE_REQ status was ORed in to the bogus status > > value. This led to the CAM transport layer repeatedly releasing > > the SIM queue, because it though that the CAM_RELEASE_SIMQ flag had > > been set. The symptom was messages like this on the console when > > INVARIANTS were enabled: > > > > xpt_release_simq: requested 1 > present 0 > > xpt_release_simq: requested 1 > present 0 > > xpt_release_simq: requested 1 > present 0 > > what is real impact of the bug? Your system will essentially hang, certainly as far as anything connected to the controller in question. > > > > mps_sas.c: In mpssas_scsiio_complete(), use &= to take status > > bits out. |= adds them in. > > > > In the error case in mpssas_scsiio_complete(), set > > the status to CAM_REQUEUE_REQ, don't OR it in. > > > > MFC after: 3 days > > This patch does not apply cleanly as r253550 had not been merged, and the first > masking does not occur on contemporary stable/9. Comments? As far as I know, this is not a problem on the version of the driver in stable/9. But then again, I have not tested the out of chain frames code since early 2011 when I last fixed it. If you want to verify the behavior is correct in stable/9, do this: 1. enable INVARIANTS 2. In /boot/loader.conf: hw.mps.max_chains=32 3. Use up most of your memory. If you're using ZFS, just do a sequential write to a file so that the ARC starts filling up with cached data. Look at the free memory in top to see how much you've used. This will cause enough fragmentation to lead to more scatter/gather segments getting used in the driver. 4. Do something like this: ((i=0)); while [ $i -lt 60 ]; do dd if=/dev/da0 of=/dev/null bs=1m & ((i++)); done 5. Look for an out of chain frames message on the console. To see how far you are towards using the chain frames, run 'sysctl dev.mps'. You can see how many chain frames you have free, and how many requests have failed. This change just needs to be merged along with the other changes to avoid having the regression in stable. Ken -- Kenneth Merry ken@FreeBSD.ORG
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130822142107.GA49996>