Date: Tue, 23 Apr 2013 11:09:42 +0300 From: Alexander Motin <mav@FreeBSD.org> To: John <jwd@FreeBSD.org> Cc: FreeBSD SCSI <freebsd-scsi@freebsd.org> Subject: Re: Repeated msgs & kernel panic w/ r246437 (Revamp the CAM enclosure services driver) Message-ID: <517641C6.7010905@FreeBSD.org> In-Reply-To: <20130422030053.GA23186@FreeBSD.org> References: <20130422030053.GA23186@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 22.04.2013 06:00, John wrote: > Hi Folks, > > After updating one of our servers to the latest stable image, > it appears that commit r246437 appears to be causing it to panic. > > The commit: > > http://svnweb.freebsd.org/base?view=revision&revision=246437 > > What one of our servers looks like: > > http://people.freebsd.org/~jwd/zfsnfsserver.jpg > > The last known working commit: > > http://people.freebsd.org/~jwd/r246437/dmesg.r246431.clean.txt > > With commit r246437: > > http://people.freebsd.org/~jwd/r246437/dmesg.r246437.log.txt > > Note, most of the dmesg output is related to the ses devices. It > repeats itself multiple times before the panic. > > ses39: ses0,pass20: Element descriptor: ' ' > ses39: ses0,pass20: SAS Expander: 24 Physses39: phy 0: connector 255 other 255 > ses39: phy 1: connector 255 other 255 > ses39: phy 2: connector 255 other 255 > ses39: phy 3: connector 255 other 255 > ses39: phy 4: connector 255 other 255 > ses39: phy 5: connector 255 other 255 > ses39: phy 6: connector 255 other 255 > > etc, etc... That is not my part of code, but I think it is just too verbose debug messages, that should be hidden. > After just a few minutes, the system panics. A pair of images > of the screen (sorry, no serial console at this time): > > Panic: http://people.freebsd.org/~jwd/r246437/20130419_160143.jpg > > bt: http://people.freebsd.org/~jwd/r246437/20130419_110158.jpg Despite that you are talking about "latest stable image", I believe your kernel is not latest 9-STABLE. Your backtrace reminds me about locking problems that should be already fixed from several sides. For example, on present 9-STABLE ses_path_iter_devid_callback() doesn't call xpt_create_path(), but calls xpt_create_path_unlocked() instead. If you can reproduce the issue with latest 9-STABLE, please provide respective information. > We are currently running a test to see if the fact that all our > shelves are dual-attached, allowing us to use geom multipath is > related. ie: we have disabled the 2nd HBA thus cutting the total > number of da & ses devices in half and thus not executing the > code in the commit that tracks duplicate ses devices. > > Note, if we disable both HBA devices and boot the system up it > does not panic or print out the repeated messages, but of course > we have no disks :-) > > I am unclear on the "connector 255 other 255" messages and have not > taken the time to look into them yet. > > I would appreciate any insights folks can provide. > > Many Thanks, > John > > ps: We've had to seriously increase the console buffer size to > capture the complete dmesg output... > > options MSGBUF_SIZE=(32768*32) > > Can we delay starting the kernel daemon until after the system > is up and /var/log/messages is available? Just a thought... The goal of this code was to create persistent location-dependent names for devices. It may be better to have them earlier. -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?517641C6.7010905>