Date: Mon, 22 Feb 2010 10:39:11 +1100 From: Lawrence Stewart <lstewart@freebsd.org> To: Alexander Motin <mav@FreeBSD.org> Cc: svn-src-stable@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, svn-src-stable-8@FreeBSD.org Subject: Re: svn commit: r203889 - in stable/8/sys: cam cam/ata cam/scsi dev/ahci dev/asr dev/ata dev/ciss dev/hptiop dev/hptrr dev/mly dev/mpt dev/ppbus dev/siis dev/trm dev/twa dev/usb/storage Message-ID: <4B81C41F.2080601@freebsd.org> In-Reply-To: <4B7EC763.4090507@FreeBSD.org> References: <201002141938.o1EJcRpx065470@svn.freebsd.org> <4B7D4962.8070706@freebsd.org> <4B7EC763.4090507@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 02/20/10 04:16, Alexander Motin wrote: > Lawrence Stewart wrote: >> A couple of times it has gotten even more upset reporting things like this: >> >> mpt0: mpt_cam_event: 0x16 >> mpt0: mpt_cam_event: 0x16 >> mpt0: request 0xffffff80002f1400:54058 timed out for ccb >> 0xffffff0001c65000 (req->ccb 0xffffff0001c65000) >> mpt0: attempting to abort req 0xffffff80002f1400:54058 function 0 >> mpt0: request 0xffffff80002fd100:54059 timed out for ccb >> 0xffffff009f3ec800 (req->ccb 0xffffff009f3ec800) >> mpt0: request 0xffffff80002efcf0:54060 timed out for ccb >> 0xffffff0001bd2000 (req->ccb 0xffffff0001bd2000) >> mpt0: mpt_recover_commands: IOC Status 0x4a. Resetting controller. >> mpt0: mpt_cam_event: 0x0 >> mpt0: mpt_cam_event: 0x0 >> mpt0: completing timedout/aborted req 0xffffff80002f1400:54058 >> mpt0: completing timedout/aborted req 0xffffff80002fd100:54059 >> mpt0: completing timedout/aborted req 0xffffff80002efcf0:54060 >> mpt0: mpt_cam_event: 0x16 >> mpt0: mpt_cam_event: 0x12 >> mpt0: mpt_cam_event: 0x12 >> mpt0: mpt_cam_event: 0x16 >> mpt0: Volume(0:2): Volume Status Changed >> mpt0: request 0xffffff80002f8990:0 timed out for ccb 0xffffff009f3cb800 >> (req->ccb 0) >> >> No ill effects are observed after such an episode and the array remains >> in healthy as-normal state. The only observable problem is the stall of >> all disk IO while these events occur. > > I have no idea how mpt driver works, neither I have hardware to play, > but quick look shows that 0x12 event is MPI_EVENT_SAS_PHY_LINK_STATUS, > and 0x16 is MPI_EVENT_SAS_DISCOVERY. Both are not handled by mpt driver > and so logged. I would say something is going on at physical level of > your SAN. Timeouts are also could be the result of physical issues. Ok, I'll try and figure out what's possibly going on. > >> As best I can tell, the hardware is ok, both disks report as fine >> without SMART errors and are only 2 months old, so wanted to rule out >> software issues. On upgrading to recent 8-STABLE, I got a page fault >> kernel panic on boot in the mpt driver mpt_raid0 kproc. After some trial >> and error, r203888 is the most recent revision that boots fine, whilst >> r203889 exhibits the page fault. I should also note that r203888 still >> sees the "mpt0: mpt_cam_event: 0x16" messages and associated disk IO >> stalls. >> >> I compiled DDB into my r203889 kernel. Unfortunately my ILO emulates a >> USB keyboard so I can't do anything in DDB which is a huge pain, but >> here's the info I did get (hand transcribed): >> >> Fatal trap 12: page fault while in kernel mode >> current process: mpt_raid0 >> Stopped at xpt_rescan+0x1d: movq 0x10(%rsi),%rdx >> >> 1. Any thoughts on how to resolve the regression in the mpt driver with >> the r203889 commit? > > Any thoughts where to find a good telepath? :) > > For the beginning, show at least verbose boot messages up to the crash. > Full panic message could also be useful, it may show address of the > fault instruction, which may be resolved to source line with addr2line > tool. If you could find a good old PS/2 keyboard, backtrace would be > interesting to see. 2 issues: - The server is in colocated rack space and not easy to get to - I'm not even sure that this server has PS2 ports on it Perhaps this commit should be backed out of 8-STABLE until we get a chance to diagnose a bit more? Cheers, Lawrence
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B81C41F.2080601>