Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Apr 2011 22:37:03 +0200
From:      Andre Albsmeier <Andre.Albsmeier@siemens.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "svn-src-stable-7@freebsd.org" <svn-src-stable-7@freebsd.org>, "Albsmeier, Andre" <andre.albsmeier@siemens.com>
Subject:   Re: svn commit: r218277 - in stable/7/sys: kern sys
Message-ID:  <20110415203703.GA89116@curry.mchp.siemens.de>
In-Reply-To: <201104151235.05114.jhb@freebsd.org>
References:  <201102041444.p14EixJP087709@svn.freebsd.org> <20110415132525.GA88202@curry.mchp.siemens.de> <201104151235.05114.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote:
> On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote:
> > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote:
> > > Author: jhb
> > > Date: Fri Feb  4 14:44:59 2011
> > > New Revision: 218277
> > > URL: http://svn.freebsd.org/changeset/base/218277
> > > 
> > > Log:
> > >   MFC 217075:
> > >   Retire PCONFIG and leave the priority of thread0 alone when waiting for
> > >   interrupt config hooks to execute.
> > >   
> > >   To preserve the KBI, I did not renumber priorities but simply removed
> > >   PCONFIG.
> > > 
> > > Modified:
> > >   stable/7/sys/kern/subr_autoconf.c
> > >   stable/7/sys/sys/priority.h
> > > Directory Properties:
> > >   stable/7/sys/   (props changed)
> > >   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
> > >   stable/7/sys/contrib/dev/acpica/   (props changed)
> > >   stable/7/sys/contrib/pf/   (props changed)
> > > 
> > > Modified: stable/7/sys/kern/subr_autoconf.c
> > > 
> ==============================================================================
> > > --- stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:42 2011	
> (r218276)
> > > +++ stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:59 2011	
> (r218277)
> > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy)
> > >  	warned = 0;
> > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > -		    PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > +		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > >  		    EWOULDBLOCK) {
> > >  			mtx_unlock(&intr_config_hook_lock);
> > >  			warned++;
> > 
> > 
> > This broke several of my machines in a somewhat strange way:
> > 
> > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12)
> > I noticed that some (4) of them didn't start. All 4 didn't find
> > their boot device anymore. What they all got in common is:
> > 
> > - an Adaptec 2940 Ultra SCSI adapter
> > - two SCSI harddisks (da0 and da1) of various brands
> > - one SCSI CDROM drive (cd0)
> > 
> > To be exact, none of the three devices (da0, da1, cd0) were
> > detected at all. Other machines with a similar configuration
> > (2940 and da0/da1) but _without_ the CDROM drive didn't have
> > any problems. So I simply removed the CDROM drives on the 4
> > machines in question and they all booted again.
> > 
> > Today I decided to dig into this and after reverting(*) the
> > above change, they worked with the CDROM again. I have cross-
> > checked it 3 times. No idea what's happening here...
> > 
> > 	-Andre
> > 
> > (*) To be honest, I use this patch so I had to modify only one file:
> > 
> > --- sys/kern/subr_autoconf.c.ORI	2011-02-05 13:14:11.000000000 +0100
> > +++ sys/kern/subr_autoconf.c	2011-04-15 14:34:31.000000000 +0200
> > @@ -108,7 +108,7 @@
> >  	warned = 0;
> >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > -		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > +		    PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> >  		    EWOULDBLOCK) {
> >  			mtx_unlock(&intr_config_hook_lock);
> >  			warned++;
> 
> Do you get any warnings about CAM timeouts, etc. when these probe?  A verbose 

This is a part of a verbose dmesg with a working(!) kernel:

Apr 15 12:44:33 <kern.crit> inside kernel: splash: image decoder found: snake_saver
Apr 15 12:44:33 <kern.crit> inside kernel: lo0: bpf attached
Apr 15 12:44:33 <kern.crit> inside kernel: (noperiph:ahc0:0:-1:-1): SCSI bus reset delivered. 0 SCBs aborted.
Apr 15 12:44:33 <kern.crit> inside kernel: ahc0: Selection Timeout on A:5. 0 SCBs aborted
Apr 15 12:44:33 <kern.crit> inside kernel: ahc0: Selection Timeout on A:2. 0 SCBs aborted
Apr 15 12:44:33 <kern.crit> inside kernel: ahc0: Selection Timeout on A:3. 0 SCBs aborted
Apr 15 12:44:33 <kern.crit> inside kernel: ahc0: Selection Timeout on A:4. 0 SCBs aborted
Apr 15 12:44:33 <kern.crit> inside kernel: (probe0:ahc0:0:0:0): Retrying Command
Apr 15 12:44:33 <kern.crit> inside kernel: (probe1:ahc0:0:1:0): Retrying Command
Apr 15 12:44:33 <kern.crit> inside kernel: (probe6:ahc0:0:6:0): Retrying Command
Apr 15 12:44:33 <kern.crit> inside kernel: (probe6:ahc0:0:6:0): error 22
Apr 15 12:44:33 <kern.crit> inside kernel: (probe6:ahc0:0:6:0): Unretryable Error
Apr 15 12:44:33 <kern.crit> inside kernel: (probe6:ahc0:0:6:0): Down reving Protocol Version from 4 to 2?
Apr 15 12:44:33 <kern.crit> inside kernel: (probe0:ahc0:0:0:0): Down reving Protocol Version from 4 to 2?
Apr 15 12:44:33 <kern.crit> inside kernel: (probe1:ahc0:0:1:0): Down reving Protocol Version from 4 to 2?
Apr 15 12:44:33 <kern.crit> inside kernel: (ahc0:A:6:0): Sending SDTR period c, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: (ahc0:A:6:0): Received SDTR period 19, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: Filtered to period 19, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: ahc0: target 6 synchronous at 10.0MHz, offset = 0xf
Apr 15 12:44:33 <kern.crit> inside kernel: (ahc0:A:0:0): Sending SDTR period c, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: (ahc0:A:0:0): Received SDTR period c, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: Filtered to period c, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: ahc0: target 0 synchronous at 20.0MHz, offset = 0xf
Apr 15 12:44:33 <kern.crit> inside kernel: (ahc0:A:1:0): Sending SDTR period c, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: (ahc0:A:1:0): Received SDTR period 19, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: Filtered to period 19, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: ahc0: target 1 synchronous at 10.0MHz, offset = 0xf
Apr 15 12:44:33 <kern.crit> inside kernel: (ahc0:A:6:0): Sending SDTR period 19, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: (ahc0:A:6:0): Received SDTR period 19, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: Filtered to period 19, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: (probe6:ahc0:0:6:0): error 16
Apr 15 12:44:33 <kern.crit> inside kernel: (probe6:ahc0:0:6:0): Unretryable Error
Apr 15 12:44:33 <kern.crit> inside kernel: pass0 at ahc0 bus 0 target 0 lun 0
Apr 15 12:44:33 <kern.crit> inside kernel: pass0: <IBM DORS-32160 WA6A> Fixed Direct Access SCSI-2 device
Apr 15 12:44:33 <kern.crit> inside kernel: pass0: Serial Number 5U2V4605
Apr 15 12:44:33 <kern.crit> inside kernel: pass0: 20.000MB/s transfers (20.000MHz, offset 15)
Apr 15 12:44:33 <kern.crit> inside kernel: pass0: Command Queueing Enabled
Apr 15 12:44:33 <kern.crit> inside kernel: pass1 at ahc0 bus 0 target 1 lun 0
Apr 15 12:44:33 <kern.crit> inside kernel: pass1: <IBM DORS-32160 S82C> Fixed Direct Access SCSI-2 device
Apr 15 12:44:33 <kern.crit> inside kernel: pass1: Serial Number 5U0A7413
Apr 15 12:44:33 <kern.crit> inside kernel: pass1: 10.000MB/s transfers (10.000MHz, offset 15)
Apr 15 12:44:33 <kern.crit> inside kernel: pass1: Command Queueing Enabled
Apr 15 12:44:33 <kern.crit> inside kernel: pass6 at ahc0 bus 0 target 6 lun 0
Apr 15 12:44:33 <kern.crit> inside kernel: pass6: <TOSHIBA CD-ROM XM-6201TA 1400> Removable CD-ROM SCSI-2 device
Apr 15 12:44:33 <kern.crit> inside kernel: pass6: 10.000MB/s transfers (10.000MHz, offset 15)

<--------------- XXX ---------------->

Apr 15 12:44:33 <kern.crit> inside kernel: GEOM: new disk da0
Apr 15 12:44:33 <kern.crit> inside kernel: (ahc0:A:6:0): Sending SDTR period 19, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: (ahc0:A:6:0): Received SDTR period 19, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: Filtered to period 19, offset f
Apr 15 12:44:33 <kern.crit> inside kernel: (cd0:ahc0:0:6:0): Retrying Command
Apr 15 12:44:33 <kern.crit> inside kernel: da0 at ahc0 bus 0 target 0 lun 0


If I remember things correctly, the line marked XXX is where the
bad kernel stopped and spit out the panic about the missing root
fs. I am 99% sure that pass0 and pass1 could be found in both
cases, however, everything related to da0 and da1 were missing
whith the bad kernel.

Unfortunately, I have no access to these machines before monday
to create the verbose dmesgs. Also all 4 machines are in
production but I think I can figure something out next week...

Thanks,

	-Andre



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110415203703.GA89116>