Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Feb 2007 16:31:11 -0700
From:      Scott Long <scottl@pooker.samsco.org>
To:        "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
Cc:        FreeBSD current mailing list <current@freebsd.org>
Subject:   Re: [mfi] command timeouts
Message-ID:  <45DA333F.7070800@pooker.samsco.org>
In-Reply-To: <20070219135158.E47107@maildrop.int.zabbadoz.net>
References:  <20070219130102.N47107@maildrop.int.zabbadoz.net> <20070219135158.E47107@maildrop.int.zabbadoz.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Bjoern A. Zeeb wrote:
> On Mon, 19 Feb 2007, Bjoern A. Zeeb wrote:
> 
>> Hi,
>>
>> I am testing mfi on a Dell 2950 with 6 PD, 2LD (1st LD=RAID1,
>> 2nd LD=RAID5, 1HTSP).
>> (The somewhat sucky) megacli "works".
>>
>> While most commands to gather information work fine, as do pulling out
>> disks hard, setting a disk offline or running some other commands hangs
>> 'something', which might be the controller?
>>
>> For example:
>>
>> foo# megacli -PDOffline -PhysDrv'[1:3]' -a0
>>
>> EnclId-1 SlotId-3 state changed to OffLine.
>> foo# foo# ls -l
>> <hangs forever>
>>
>> It's not only this process but all disk IO related processes.
>>
>>
>> On the serial console I get:
>>
>> ...
>> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 732 SECONDS
>> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 684 SECONDS
>> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 679 SECONDS
>> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 44 SECONDS
>> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 763 SECONDS
>> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 715 SECONDS
>> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 710 SECONDS
>> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 75 SECONDS
>> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 793 SECONDS
>> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 794 SECONDS
>> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 746 SECONDS
>> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 741 SECONDS
>> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 106 SECONDS
>> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 824 SECONDS
>> ...
>>
>>
>> I can still break to ddb. Without disk I/O, the only
>> possible thing I can really do is type reset.
>>
>> I'll build a debugging kernel so I can do show alllocks, etc
>> but if someone with more experience with this driver/hw could
>> contact me I can run further tests.
> 
> 
> this time with the debugging kernel:
> 
> foo# megacli -PDOffline -PhysDrv'[1:3]' -a0
> 
> EnclId-1 SlotId-3 state changed to OffLine.
> foo# foo# foo# foo#
> 
> 
> I was able to hit <enter> multiple times after the "uh it still lives"
> but then ...
> 
> command 0xffffffff80c40000 not in queue, flags = 0x20, bit = 0x80
> panic: command not in queue
> cpuid = 2
> Uptime: 1m17s
> Physical memory: 4084 MB
> Dumping 199 MB: 184 168 152 136 120 104 88 72 56 40 24 8
> Dump complete
> 
> telnet> send brk
> KDB: enter: Line break on console
> [thread pid 15 tid 100009 ]
> Stopped at      kdb_enter+0x2f: nop
> db> where
> Tracing pid 15 tid 100009 td 0xffffff012f5c4000
> kdb_enter() at kdb_enter+0x2f
> siointr1() at siointr1+0x400
> siointr() at siointr+0x2e
> intr_execute_handlers() at intr_execute_handlers+0x124
> Xapic_isr1() at Xapic_isr1+0x7f
> --- interrupt, rip = 0xffffffff803c9787, rsp = 0xffffffffac06eb30, rbp = 
> 0xffffffffac06eb60 ---
> _mtx_lock_sleep() at _mtx_lock_sleep+0x137
> _mtx_lock_flags() at _mtx_lock_flags+0xe1
> mfi_timeout() at mfi_timeout+0x32
> softclock() at softclock+0x1c8
> ithread_loop() at ithread_loop+0xfe
> fork_exit() at fork_exit+0xaa
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xffffffffac06ed40, rbp = 0 ---
> db> show alllocks
> Process 24 (irq78: mfi0) thread 0xffffff012f5c5000 (100020)
> exclusive sleep mutex MFI I/O lock r = 0 (0xffffff012f5cc630) locked @ 
> /u1/src/HEAD/sys/dev/mfi/mfi.c:775
> 
> 
> After the reboot it does not seem that the command
> was executed as the disk still seems to be online (at least
> it was the last time).
> 

megacli is known to be fragile.  Don't Do That (tm).  As for the panic,
It's probably a side effect of megacli putting the card and the driver 
into a chaotic state.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45DA333F.7070800>