From owner-freebsd-current@FreeBSD.ORG Mon Feb 19 23:47:11 2007 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3798416FA04 for ; Mon, 19 Feb 2007 23:47:11 +0000 (UTC) (envelope-from scottl@pooker.samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id DAFFF13C46B for ; Mon, 19 Feb 2007 23:47:10 +0000 (UTC) (envelope-from scottl@pooker.samsco.org) Received: from [192.168.254.14] (imini.samsco.home [192.168.254.14]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id l1JNVCCB059592; Mon, 19 Feb 2007 16:31:18 -0700 (MST) (envelope-from scottl@pooker.samsco.org) Message-ID: <45DA333F.7070800@pooker.samsco.org> Date: Mon, 19 Feb 2007 16:31:11 -0700 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.7) Gecko/20050416 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Bjoern A. Zeeb" References: <20070219130102.N47107@maildrop.int.zabbadoz.net> <20070219135158.E47107@maildrop.int.zabbadoz.net> In-Reply-To: <20070219135158.E47107@maildrop.int.zabbadoz.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (pooker.samsco.org [192.168.254.1]); Mon, 19 Feb 2007 16:31:19 -0700 (MST) X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED, UPPERCASE_25_50 autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org X-Mailman-Approved-At: Tue, 20 Feb 2007 02:25:07 +0000 Cc: FreeBSD current mailing list Subject: Re: [mfi] command timeouts X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Feb 2007 23:47:11 -0000 Bjoern A. Zeeb wrote: > On Mon, 19 Feb 2007, Bjoern A. Zeeb wrote: > >> Hi, >> >> I am testing mfi on a Dell 2950 with 6 PD, 2LD (1st LD=RAID1, >> 2nd LD=RAID5, 1HTSP). >> (The somewhat sucky) megacli "works". >> >> While most commands to gather information work fine, as do pulling out >> disks hard, setting a disk offline or running some other commands hangs >> 'something', which might be the controller? >> >> For example: >> >> foo# megacli -PDOffline -PhysDrv'[1:3]' -a0 >> >> EnclId-1 SlotId-3 state changed to OffLine. >> foo# foo# ls -l >> >> >> It's not only this process but all disk IO related processes. >> >> >> On the serial console I get: >> >> ... >> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 732 SECONDS >> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 684 SECONDS >> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 679 SECONDS >> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 44 SECONDS >> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 763 SECONDS >> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 715 SECONDS >> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 710 SECONDS >> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 75 SECONDS >> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 793 SECONDS >> mfi0: COMMAND 0xffffffff80c3c040 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3b8d0 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cb68 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3bd98 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3bc88 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cbf0 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cc78 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cf20 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cd88 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3cfa8 TIMEOUT AFTER 794 SECONDS >> mfi0: COMMAND 0xffffffff80c3d828 TIMEOUT AFTER 746 SECONDS >> mfi0: COMMAND 0xffffffff80c3db58 TIMEOUT AFTER 741 SECONDS >> mfi0: COMMAND 0xffffffff80c3de88 TIMEOUT AFTER 106 SECONDS >> mfi0: COMMAND 0xffffffff80c3c728 TIMEOUT AFTER 824 SECONDS >> ... >> >> >> I can still break to ddb. Without disk I/O, the only >> possible thing I can really do is type reset. >> >> I'll build a debugging kernel so I can do show alllocks, etc >> but if someone with more experience with this driver/hw could >> contact me I can run further tests. > > > this time with the debugging kernel: > > foo# megacli -PDOffline -PhysDrv'[1:3]' -a0 > > EnclId-1 SlotId-3 state changed to OffLine. > foo# foo# foo# foo# > > > I was able to hit multiple times after the "uh it still lives" > but then ... > > command 0xffffffff80c40000 not in queue, flags = 0x20, bit = 0x80 > panic: command not in queue > cpuid = 2 > Uptime: 1m17s > Physical memory: 4084 MB > Dumping 199 MB: 184 168 152 136 120 104 88 72 56 40 24 8 > Dump complete > > telnet> send brk > KDB: enter: Line break on console > [thread pid 15 tid 100009 ] > Stopped at kdb_enter+0x2f: nop > db> where > Tracing pid 15 tid 100009 td 0xffffff012f5c4000 > kdb_enter() at kdb_enter+0x2f > siointr1() at siointr1+0x400 > siointr() at siointr+0x2e > intr_execute_handlers() at intr_execute_handlers+0x124 > Xapic_isr1() at Xapic_isr1+0x7f > --- interrupt, rip = 0xffffffff803c9787, rsp = 0xffffffffac06eb30, rbp = > 0xffffffffac06eb60 --- > _mtx_lock_sleep() at _mtx_lock_sleep+0x137 > _mtx_lock_flags() at _mtx_lock_flags+0xe1 > mfi_timeout() at mfi_timeout+0x32 > softclock() at softclock+0x1c8 > ithread_loop() at ithread_loop+0xfe > fork_exit() at fork_exit+0xaa > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffffffac06ed40, rbp = 0 --- > db> show alllocks > Process 24 (irq78: mfi0) thread 0xffffff012f5c5000 (100020) > exclusive sleep mutex MFI I/O lock r = 0 (0xffffff012f5cc630) locked @ > /u1/src/HEAD/sys/dev/mfi/mfi.c:775 > > > After the reboot it does not seem that the command > was executed as the disk still seems to be online (at least > it was the last time). > megacli is known to be fragile. Don't Do That (tm). As for the panic, It's probably a side effect of megacli putting the card and the driver into a chaotic state. Scott