Date: Mon, 19 Feb 2007 13:55:47 +0000 (UTC) From: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net> To: FreeBSD current mailing list <current@freebsd.org> Subject: Re: [mfi] command timeouts Message-ID: <20070219135158.E47107@maildrop.int.zabbadoz.net> In-Reply-To: <20070219130102.N47107@maildrop.int.zabbadoz.net> References: <20070219130102.N47107@maildrop.int.zabbadoz.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 19 Feb 2007, Bjoern A. Zeeb wrote: > Hi, > > I am testing mfi on a Dell 2950 with 6 PD, 2LD (1st LD=RAID1, > 2nd LD=RAID5, 1HTSP). > (The somewhat sucky) megacli "works". > > While most commands to gather information work fine, as do pulling out > disks hard, setting a disk offline or running some other commands hangs > 'something', which might be the controller? > > For example: > > foo# megacli -PDOffline -PhysDrv'[1:3]' -a0 > > EnclId-1 SlotId-3 state changed to OffLine. > foo# foo# ls -l > <hangs forever> > > It's not only this process but all disk IO related processes. > > > On the serial console I get: > > ... > mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 732 SECONDS > mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 732 SECONDS > mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 732 SECONDS > mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 732 SECONDS > mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 732 SECONDS > mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 732 SECONDS > mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 732 SECONDS > mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 732 SECONDS > mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 732 SECONDS > mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 732 SECONDS > mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 684 SECONDS > mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 679 SECONDS > mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 44 SECONDS > mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 763 SECONDS > mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 763 SECONDS > mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 763 SECONDS > mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 763 SECONDS > mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 763 SECONDS > mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 763 SECONDS > mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 763 SECONDS > mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 763 SECONDS > mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 763 SECONDS > mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 763 SECONDS > mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 763 SECONDS > mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 715 SECONDS > mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 710 SECONDS > mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 75 SECONDS > mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 793 SECONDS > mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 794 SECONDS > mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 794 SECONDS > mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 794 SECONDS > mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 794 SECONDS > mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 794 SECONDS > mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 794 SECONDS > mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 794 SECONDS > mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 794 SECONDS > mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 794 SECONDS > mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 794 SECONDS > mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 746 SECONDS > mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 741 SECONDS > mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 106 SECONDS > mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 824 SECONDS > ... > > > I can still break to ddb. Without disk I/O, the only > possible thing I can really do is type reset. > > I'll build a debugging kernel so I can do show alllocks, etc > but if someone with more experience with this driver/hw could > contact me I can run further tests. this time with the debugging kernel: foo# megacli -PDOffline -PhysDrv'[1:3]' -a0 EnclId-1 SlotId-3 state changed to OffLine. foo# foo# foo# foo# I was able to hit <enter> multiple times after the "uh it still lives" but then ... command 0xffffffff80c40000 not in queue, flags = 0x20, bit = 0x80 panic: command not in queue cpuid = 2 Uptime: 1m17s Physical memory: 4084 MB Dumping 199 MB: 184 168 152 136 120 104 88 72 56 40 24 8 Dump complete telnet> send brk KDB: enter: Line break on console [thread pid 15 tid 100009 ] Stopped at kdb_enter+0x2f: nop db> where Tracing pid 15 tid 100009 td 0xffffff012f5c4000 kdb_enter() at kdb_enter+0x2f siointr1() at siointr1+0x400 siointr() at siointr+0x2e intr_execute_handlers() at intr_execute_handlers+0x124 Xapic_isr1() at Xapic_isr1+0x7f --- interrupt, rip = 0xffffffff803c9787, rsp = 0xffffffffac06eb30, rbp = 0xffffffffac06eb60 --- _mtx_lock_sleep() at _mtx_lock_sleep+0x137 _mtx_lock_flags() at _mtx_lock_flags+0xe1 mfi_timeout() at mfi_timeout+0x32 softclock() at softclock+0x1c8 ithread_loop() at ithread_loop+0xfe fork_exit() at fork_exit+0xaa fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffac06ed40, rbp = 0 --- db> show alllocks Process 24 (irq78: mfi0) thread 0xffffff012f5c5000 (100020) exclusive sleep mutex MFI I/O lock r = 0 (0xffffff012f5cc630) locked @ /u1/src/HEAD/sys/dev/mfi/mfi.c:775 After the reboot it does not seem that the command was executed as the disk still seems to be online (at least it was the last time). -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070219135158.E47107>