Date: Mon, 5 Nov 2012 16:55:11 -0000 From: "Steven Hartland" <killing@multiplay.co.uk> To: <freebsd-stable@freebsd.org>, <freebsd-scsi@freebsd.org> Subject: Re: mfi panic on recused on non-recusive mutex MFI I/O lock Message-ID: <39D16C43C8274CE9B8F23C18459E2FD4@multiplay.co.uk> References: <2DC1C56CFFF24FE0B17C34AD21A7DFAA@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
I've managed to get the machine to reproduce this fairly regularly now. Without a debug kernel it still results in a panic, just at a later stage or so I believe, the none debug panic messages is "command not in queue". In each none debug panic I've seen the cm_flags indicates the command being dequeued is on the busy queue and not on the expected free or ready queue which is being processed at the time. The triggering issue seems to be the adapter reset code run from mfi_timeout. I've had a good look but can't see how a cm could be in a queue yet have its cm_flags set to that of a different queue as all manipulation seems to be being done via the "mfi_<method> ## name" macros which all correctly maintain the queue / cm_flags relationship. At this point I believe it could be a thread being interrupted by a timeout part way the processing of a queue request hence queue and cm_flags being out of sync. Any pointers on how to debug this issue further / fix it would be most appreciated. Regards Steve ----- Original Message ----- From: "Steven Hartland" > Testing a new machine which is based on 8.3-RELEASE with the mfi > driver from 8-STABLE and just got a panic. > > > The below is translation of the hand copied from console:- > mfi0: sense error 0, sense_key 0, asc 0, ascq 0 > mfisyspd5: hard error cmd=write 90827650-90827905 > mfi0: I/O error, status= 46 scsi_status= 240 > mfi0: sense error 0, sense_key 0, asc 0, ascq 0 > mfisyspd5: hard error cmd=write 90827394-90827649 > mfi0: I/O error, status= 46 scsi_status= 240 > mfi0: sense error 0, sense_key 0, asc 0, ascq 0 > mfisyspd5: hard error cmd=write 90827138-90827393 > mfi0: I/O error, status= 46 scsi_status= 240 > mfi0: sense error 0, sense_key 0, asc 0, ascq 0 > mfisyspd5: hard error cmd=write 90826882-90827137 > mfi0: I/O error, status= 2 scsi_status= 2 > mfi0: sense error 112, sense_key 6, asc 41, ascq 0 > mfisyspd4: hard error cmd=write 90830466-90830721 > mfi0: I/O error, status= 2 scsi_status= 2 > mfi0: sense error 112, sense_key 6, asc 41, ascq 0 > mfisyspd5: hard error cmd=write 90830722-90830977 > mfi0: Adapter RESET condition detected > mfi0: First state FW reset initiated... > mfi0: ADP_RESET_TBOLT: HostDiag=a0 > mfi0: first state of reset complete, second state initiated... > mfi0: Second state FW reset initiated... > panic: _mtx_lock_sleep: recursed on non-recusive mutex MFI I/O lock @ /usr/src/sys/dev/mfi/mfi_tbolt:346 > > cpuid = 6 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x178 > _mtx_lock_sleep() at _mtx_lock_sleep+0x152 > _mtx_lock_flags() at _mtx_lock_flags+0x80 > mfi_tbolt_init_MFI_queue() at mfi_tbolt_init_MFI_queue+0x72 > mfi_timeout() at mfi_timeout+0x27 > softclock() at softclock+0x2aa > intr_event_execute_handlers() at intr_event_execute_handlers+0x66 > ithread_loop() at ithread_loop+0xb2 > fork_exit() at fork_exit+0x135 > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffff80005ccd00, rbp = 0 --- > KDB: enter panic > [thread pid 12 tid 100020 ] > Stopperd at kdb_enter+0x3b: movq $0,0x51cb32(%rip) > db> > > So questions:- > 1. What are the "hard error" errors? The machine was testing IO > with dd but due to the panic I cant tell if that was the cause. > 2. Looking at the code this seems like the reset was tripped by > firmware bug, is that the case? > 3. Is the fix the panic a simple one we cat test? ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39D16C43C8274CE9B8F23C18459E2FD4>