Date: Mon, 19 Feb 2007 16:31:11 -0700 From: Scott Long <scottl@pooker.samsco.org> To: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net> Cc: FreeBSD current mailing list <current@freebsd.org> Subject: Re: [mfi] command timeouts Message-ID: <45DA333F.7070800@pooker.samsco.org> In-Reply-To: <20070219135158.E47107@maildrop.int.zabbadoz.net> References: <20070219130102.N47107@maildrop.int.zabbadoz.net> <20070219135158.E47107@maildrop.int.zabbadoz.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Bjoern A. Zeeb wrote: > On Mon, 19 Feb 2007, Bjoern A. Zeeb wrote: > >> Hi, >> >> I am testing mfi on a Dell 2950 with 6 PD, 2LD (1st LD=RAID1, >> 2nd LD=RAID5, 1HTSP). >> (The somewhat sucky) megacli "works". >> >> While most commands to gather information work fine, as do pulling out >> disks hard, setting a disk offline or running some other commands hangs >> 'something', which might be the controller? >> >> For example: >> >> foo# megacli -PDOffline -PhysDrv'[1:3]' -a0 >> >> EnclId-1 SlotId-3 state changed to OffLine. >> foo# foo# ls -l >> <hangs forever> >> >> It's not only this process but all disk IO related processes. >> >> >> On the serial console I get: >> >> ... >> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 684 SECONDS >> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 679 SECONDS >> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 44 SECONDS >> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 715 SECONDS >> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 710 SECONDS >> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 75 SECONDS >> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 793 SECONDS >> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 746 SECONDS >> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 741 SECONDS >> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 106 SECONDS >> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 824 SECONDS >> ... >> >> >> I can still break to ddb. Without disk I/O, the only >> possible thing I can really do is type reset. >> >> I'll build a debugging kernel so I can do show alllocks, etc >> but if someone with more experience with this driver/hw could >> contact me I can run further tests. > > > this time with the debugging kernel: > > foo# megacli -PDOffline -PhysDrv'[1:3]' -a0 > > EnclId-1 SlotId-3 state changed to OffLine. > foo# foo# foo# foo# > > > I was able to hit <enter> multiple times after the "uh it still lives" > but then ... > > command 0xffffffff80c40000 not in queue, flags = 0x20, bit = 0x80 > panic: command not in queue > cpuid = 2 > Uptime: 1m17s > Physical memory: 4084 MB > Dumping 199 MB: 184 168 152 136 120 104 88 72 56 40 24 8 > Dump complete > > telnet> send brk > KDB: enter: Line break on console > [thread pid 15 tid 100009 ] > Stopped at kdb_enter+0x2f: nop > db> where > Tracing pid 15 tid 100009 td 0xffffff012f5c4000 > kdb_enter() at kdb_enter+0x2f > siointr1() at siointr1+0x400 > siointr() at siointr+0x2e > intr_execute_handlers() at intr_execute_handlers+0x124 > Xapic_isr1() at Xapic_isr1+0x7f > --- interrupt, rip = 0xffffffff803c9787, rsp = 0xffffffffac06eb30, rbp = > 0xffffffffac06eb60 --- > _mtx_lock_sleep() at _mtx_lock_sleep+0x137 > _mtx_lock_flags() at _mtx_lock_flags+0xe1 > mfi_timeout() at mfi_timeout+0x32 > softclock() at softclock+0x1c8 > ithread_loop() at ithread_loop+0xfe > fork_exit() at fork_exit+0xaa > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffffffac06ed40, rbp = 0 --- > db> show alllocks > Process 24 (irq78: mfi0) thread 0xffffff012f5c5000 (100020) > exclusive sleep mutex MFI I/O lock r = 0 (0xffffff012f5cc630) locked @ > /u1/src/HEAD/sys/dev/mfi/mfi.c:775 > > > After the reboot it does not seem that the command > was executed as the disk still seems to be online (at least > it was the last time). > megacli is known to be fragile. Don't Do That (tm). As for the panic, It's probably a side effect of megacli putting the card and the driver into a chaotic state. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45DA333F.7070800>