Date: Tue, 23 Apr 2013 08:02:37 -0600 From: "Kenneth D. Merry" <ken@freebsd.org> To: Alexander Motin <mav@freebsd.org> Cc: John <jwd@freebsd.org>, FreeBSD SCSI <freebsd-scsi@freebsd.org> Subject: Re: Repeated msgs & kernel panic w/ r246437 (Revamp the CAM enclosure services driver) Message-ID: <20130423140237.GA50775@nargothrond.kdm.org> In-Reply-To: <517641C6.7010905@FreeBSD.org> References: <20130422030053.GA23186@FreeBSD.org> <517641C6.7010905@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 23, 2013 at 11:09:42 +0300, Alexander Motin wrote: > On 22.04.2013 06:00, John wrote: > >Hi Folks, > > > > After updating one of our servers to the latest stable image, > >it appears that commit r246437 appears to be causing it to panic. > > > >The commit: > > > >http://svnweb.freebsd.org/base?view=revision&revision=246437 > > > >What one of our servers looks like: > > > >http://people.freebsd.org/~jwd/zfsnfsserver.jpg > > > >The last known working commit: > > > >http://people.freebsd.org/~jwd/r246437/dmesg.r246431.clean.txt > > > >With commit r246437: > > > >http://people.freebsd.org/~jwd/r246437/dmesg.r246437.log.txt > > > >Note, most of the dmesg output is related to the ses devices. It > >repeats itself multiple times before the panic. > > > >ses39: ses0,pass20: Element descriptor: ' ' > >ses39: ses0,pass20: SAS Expander: 24 Physses39: phy 0: connector 255 > >other 255 > >ses39: phy 1: connector 255 other 255 > >ses39: phy 2: connector 255 other 255 > >ses39: phy 3: connector 255 other 255 > >ses39: phy 4: connector 255 other 255 > >ses39: phy 5: connector 255 other 255 > >ses39: phy 6: connector 255 other 255 > > > >etc, etc... > > That is not my part of code, but I think it is just too verbose debug > messages, that should be hidden. Yes, it is probably too verbose, especially on such a large system. > >After just a few minutes, the system panics. A pair of images > >of the screen (sorry, no serial console at this time): > > > >Panic: http://people.freebsd.org/~jwd/r246437/20130419_160143.jpg > > > >bt: http://people.freebsd.org/~jwd/r246437/20130419_110158.jpg > > Despite that you are talking about "latest stable image", I believe your > kernel is not latest 9-STABLE. Your backtrace reminds me about locking > problems that should be already fixed from several sides. For example, > on present 9-STABLE ses_path_iter_devid_callback() doesn't call > xpt_create_path(), but calls xpt_create_path_unlocked() instead. If you > can reproduce the issue with latest 9-STABLE, please provide respective > information. I agree. I added the xpt_create_path_unlocked() call to fix a panic with a stack trace just like the one above. It looks like a problem due to running r246437 exactly. > >We are currently running a test to see if the fact that all our > >shelves are dual-attached, allowing us to use geom multipath is > >related. ie: we have disabled the 2nd HBA thus cutting the total > >number of da & ses devices in half and thus not executing the > >code in the commit that tracks duplicate ses devices. > > > >Note, if we disable both HBA devices and boot the system up it > >does not panic or print out the repeated messages, but of course > >we have no disks :-) > > > >I am unclear on the "connector 255 other 255" messages and have not > >taken the time to look into them yet. > > > >I would appreciate any insights folks can provide. > > > >Many Thanks, > >John > > > >ps: We've had to seriously increase the console buffer size to > >capture the complete dmesg output... > > > >options MSGBUF_SIZE=(32768*32) > > > >Can we delay starting the kernel daemon until after the system > >is up and /var/log/messages is available? Just a thought... > > The goal of this code was to create persistent location-dependent names > for devices. It may be better to have them earlier. Yes, I agree. Ken -- Kenneth Merry ken@FreeBSD.ORG
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130423140237.GA50775>