Date: Tue, 19 Apr 2011 14:56:48 +0200 From: Andre Albsmeier <Andre.Albsmeier@siemens.com> To: John Baldwin <jhb@freebsd.org> Cc: "svn-src-stable-7@freebsd.org" <svn-src-stable-7@freebsd.org>, "scsi@freebsd.org" <scsi@freebsd.org> Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys Message-ID: <20110419125648.GA17780@curry.mchp.siemens.de> In-Reply-To: <201104180918.26054.jhb@freebsd.org> References: <201102041444.p14EixJP087709@svn.freebsd.org> <201104151235.05114.jhb@freebsd.org> <20110418113657.GA6080@curry.mchp.siemens.de> <201104180918.26054.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote: > On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote: > > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote: > > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote: > > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote: > > > > > Author: jhb > > > > > Date: Fri Feb 4 14:44:59 2011 > > > > > New Revision: 218277 > > > > > URL: http://svn.freebsd.org/changeset/base/218277 > > > > > > > > > > Log: > > > > > MFC 217075: > > > > > Retire PCONFIG and leave the priority of thread0 alone when waiting for > > > > > interrupt config hooks to execute. > > > > > > > > > > To preserve the KBI, I did not renumber priorities but simply removed > > > > > PCONFIG. > > > > > > > > > > Modified: > > > > > stable/7/sys/kern/subr_autoconf.c > > > > > stable/7/sys/sys/priority.h > > > > > Directory Properties: > > > > > stable/7/sys/ (props changed) > > > > > stable/7/sys/cddl/contrib/opensolaris/ (props changed) > > > > > stable/7/sys/contrib/dev/acpica/ (props changed) > > > > > stable/7/sys/contrib/pf/ (props changed) > > > > > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c > > > > > > > > ============================================================================== > > > > > --- stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:42 2011 > > > (r218276) > > > > > +++ stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:59 2011 > > > (r218277) > > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy) > > > > > warned = 0; > > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > > - PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > + 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > > EWOULDBLOCK) { > > > > > mtx_unlock(&intr_config_hook_lock); > > > > > warned++; > > > > > > > > > > > > This broke several of my machines in a somewhat strange way: > > > > > > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12) > > > > I noticed that some (4) of them didn't start. All 4 didn't find > > > > their boot device anymore. What they all got in common is: > > > > > > > > - an Adaptec 2940 Ultra SCSI adapter > > > > - two SCSI harddisks (da0 and da1) of various brands > > > > - one SCSI CDROM drive (cd0) > > > > > > > > To be exact, none of the three devices (da0, da1, cd0) were > > > > detected at all. Other machines with a similar configuration > > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have > > > > any problems. So I simply removed the CDROM drives on the 4 > > > > machines in question and they all booted again. > > > > > > > > Today I decided to dig into this and after reverting(*) the > > > > above change, they worked with the CDROM again. I have cross- > > > > checked it 3 times. No idea what's happening here... > > > > > > > > -Andre > > > > > > > > (*) To be honest, I use this patch so I had to modify only one file: > > > > > > > > --- sys/kern/subr_autoconf.c.ORI 2011-02-05 13:14:11.000000000 +0100 > > > > +++ sys/kern/subr_autoconf.c 2011-04-15 14:34:31.000000000 +0200 > > > > @@ -108,7 +108,7 @@ > > > > warned = 0; > > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > > - 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > + PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > > EWOULDBLOCK) { > > > > mtx_unlock(&intr_config_hook_lock); > > > > warned++; > > > > > > Do you get any warnings about CAM timeouts, etc. when these probe? A verbose > > > dmesg might be nice to look at if possible. > > > > OK, I have set up a machine for testing. In my other mail > > I was wrong saying that the pass devices appear when using > > the problematic kernel... > > > > Here are the dmesgs: > > > > - dmesg_bad is the original kernel as of Friday > > - dmesg_ok is the patched kernel (see above) as of Friday > > - dmesg.diff is the diff between both > > > > If you want me to try something just tell me... > > Hmmm, what if you make SCSI_DELAY larger? Also, can you let it fail the > mount and drop into ddb and then get 'ps' output? As soon as I include the debugger into the kernel the problem is gone. I have double-checked it two times now: With debugger the drives are detected, without debugger mostly (but not always) not. I currently have it running in an endless rebooting loop hoping, that it fails eventually... -Andre > > I think the CAM boot probe is broken a bit. xpt_rescan_done() always calls > xpt_release_boot(), but we don't hold the boot for each bus added while > buses_config_done is 0, so it seems CAM only waits for at least one bus to > rescan before it lets the boot continue? This seems wrong (i.e. one would > think it would let all the busses added before this point scan before > continuing). > > However, in your dmesg, it starts to print out an announcement for a pass > device before it starts mounting root, so it seems that xpt is finishing too > early somehow. > > -- > John Baldwin -- UNIX is an operating system, OS/2 is half an operating system, Windows is a shell, and DOS is a bootsector virus.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110419125648.GA17780>