Date: Tue, 6 Nov 2012 10:01:52 -0800 From: Doug Ambrisko <ambrisko@ambrisko.com> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-scsi@freebsd.org, freebsd-stable@freebsd.org Subject: Re: mfi panic on recused on non-recusive mutex MFI I/O lock Message-ID: <20121106180152.GA40422@ambrisko.com> In-Reply-To: <27169C7FE704495087A093752D15E7B6@multiplay.co.uk> References: <2DC1C56CFFF24FE0B17C34AD21A7DFAA@multiplay.co.uk> <39D16C43C8274CE9B8F23C18459E2FD4@multiplay.co.uk> <20121105212911.GA17904@ambrisko.com> <27169C7FE704495087A093752D15E7B6@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Nov 06, 2012 at 12:09:42AM -0000, Steven Hartland wrote: | Thanks Doug, actually just finished another test run with some more | debugging in and I believe I've found the reason for the non-recusive | lock and at least some of the queuing issues. | | The non-recursive lock is due to the mfi_tbolt_reset calling | mfi_process_fw_state_chg_isr with mfi_io_lock held which in turn calls | mfi_tbolt_init_MFI_queue which tries to acquire mfi_io_lock hence | the problem. | | mfi-lock.txt attached I believe fixes this as well as what appears | to be an invalid call to mtx_unlock(&sc->mfi_io_lock) in mfi_attach | which never acquires the lock as far as can see, possibly a cut and | paste error. I don't seem to see the attachment. | The invalid queue problems seem to stem from the error cases of | the calls to mfi_mapcmd, some of which call mfi_release_command which | blindly sets cm_flags = 0 and then enqueues it on the free queue. Now | depending on the flow of mfi_mapcmd and where the error occurs the | command may or may not have been put on the busy queue which is going | to cause problems. | | Going to investigate this further but that's what my current theory is. | | Your patch seems quite extensive, so if could you give me brief run | down on the changes that would be most appreciated. I'll being doing that in the commit message which should happen today. | FYI, I'm aware that the cause of my underlying issues are some | hardware issues (likely cable or backplane related) but it does mean | I'm in the position to test these usually rare error cases, so wanting | the make the most of it before we get the hardware swapped out. That would be good. It makes it easier to debug things when it shows the problem. Thanks, Doug A.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121106180152.GA40422>