From owner-svn-src-all@FreeBSD.ORG Fri Apr 15 20:37:06 2011 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 86AB71065678; Fri, 15 Apr 2011 20:37:06 +0000 (UTC) (envelope-from Andre.Albsmeier@siemens.com) Received: from goliath.siemens.de (goliath.siemens.de [192.35.17.28]) by mx1.freebsd.org (Postfix) with ESMTP id 0921A8FC35; Fri, 15 Apr 2011 20:37:05 +0000 (UTC) Received: from mail2.siemens.de (localhost [127.0.0.1]) by goliath.siemens.de (8.13.6/8.13.6) with ESMTP id p3FKb4Ns003490; Fri, 15 Apr 2011 22:37:04 +0200 Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.40.130]) by mail2.siemens.de (8.13.6/8.13.6) with ESMTP id p3FKb4bA010031; Fri, 15 Apr 2011 22:37:04 +0200 Received: (from localhost) by curry.mchp.siemens.de (8.14.4/8.14.4) id p3FKb40h009921; Date: Fri, 15 Apr 2011 22:37:03 +0200 From: Andre Albsmeier To: John Baldwin Message-ID: <20110415203703.GA89116@curry.mchp.siemens.de> References: <201102041444.p14EixJP087709@svn.freebsd.org> <20110415132525.GA88202@curry.mchp.siemens.de> <201104151235.05114.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201104151235.05114.jhb@freebsd.org> X-Echelon: X-Advice: Drop that crappy M$-Outlook, I'm tired of your viruses! User-Agent: Mutt/1.5.20 (2009-06-14) Cc: "svn-src-all@freebsd.org" , "svn-src-stable-7@freebsd.org" , "Albsmeier, Andre" Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Apr 2011 20:37:06 -0000 On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote: > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote: > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote: > > > Author: jhb > > > Date: Fri Feb 4 14:44:59 2011 > > > New Revision: 218277 > > > URL: http://svn.freebsd.org/changeset/base/218277 > > > > > > Log: > > > MFC 217075: > > > Retire PCONFIG and leave the priority of thread0 alone when waiting for > > > interrupt config hooks to execute. > > > > > > To preserve the KBI, I did not renumber priorities but simply removed > > > PCONFIG. > > > > > > Modified: > > > stable/7/sys/kern/subr_autoconf.c > > > stable/7/sys/sys/priority.h > > > Directory Properties: > > > stable/7/sys/ (props changed) > > > stable/7/sys/cddl/contrib/opensolaris/ (props changed) > > > stable/7/sys/contrib/dev/acpica/ (props changed) > > > stable/7/sys/contrib/pf/ (props changed) > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c > > > > ============================================================================== > > > --- stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:42 2011 > (r218276) > > > +++ stable/7/sys/kern/subr_autoconf.c Fri Feb 4 14:44:59 2011 > (r218277) > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy) > > > warned = 0; > > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > > - PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > + 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > > EWOULDBLOCK) { > > > mtx_unlock(&intr_config_hook_lock); > > > warned++; > > > > > > This broke several of my machines in a somewhat strange way: > > > > After upgrading them (17) to a recent 7-STABLE (as of 2011-04-12) > > I noticed that some (4) of them didn't start. All 4 didn't find > > their boot device anymore. What they all got in common is: > > > > - an Adaptec 2940 Ultra SCSI adapter > > - two SCSI harddisks (da0 and da1) of various brands > > - one SCSI CDROM drive (cd0) > > > > To be exact, none of the three devices (da0, da1, cd0) were > > detected at all. Other machines with a similar configuration > > (2940 and da0/da1) but _without_ the CDROM drive didn't have > > any problems. So I simply removed the CDROM drives on the 4 > > machines in question and they all booted again. > > > > Today I decided to dig into this and after reverting(*) the > > above change, they worked with the CDROM again. I have cross- > > checked it 3 times. No idea what's happening here... > > > > -Andre > > > > (*) To be honest, I use this patch so I had to modify only one file: > > > > --- sys/kern/subr_autoconf.c.ORI 2011-02-05 13:14:11.000000000 +0100 > > +++ sys/kern/subr_autoconf.c 2011-04-15 14:34:31.000000000 +0200 > > @@ -108,7 +108,7 @@ > > warned = 0; > > while (!TAILQ_EMPTY(&intr_config_hook_list)) { > > if (msleep(&intr_config_hook_list, &intr_config_hook_lock, > > - 0, "conifhk", WARNING_INTERVAL_SECS * hz) == > > + PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * hz) == > > EWOULDBLOCK) { > > mtx_unlock(&intr_config_hook_lock); > > warned++; > > Do you get any warnings about CAM timeouts, etc. when these probe? A verbose This is a part of a verbose dmesg with a working(!) kernel: Apr 15 12:44:33 inside kernel: splash: image decoder found: snake_saver Apr 15 12:44:33 inside kernel: lo0: bpf attached Apr 15 12:44:33 inside kernel: (noperiph:ahc0:0:-1:-1): SCSI bus reset delivered. 0 SCBs aborted. Apr 15 12:44:33 inside kernel: ahc0: Selection Timeout on A:5. 0 SCBs aborted Apr 15 12:44:33 inside kernel: ahc0: Selection Timeout on A:2. 0 SCBs aborted Apr 15 12:44:33 inside kernel: ahc0: Selection Timeout on A:3. 0 SCBs aborted Apr 15 12:44:33 inside kernel: ahc0: Selection Timeout on A:4. 0 SCBs aborted Apr 15 12:44:33 inside kernel: (probe0:ahc0:0:0:0): Retrying Command Apr 15 12:44:33 inside kernel: (probe1:ahc0:0:1:0): Retrying Command Apr 15 12:44:33 inside kernel: (probe6:ahc0:0:6:0): Retrying Command Apr 15 12:44:33 inside kernel: (probe6:ahc0:0:6:0): error 22 Apr 15 12:44:33 inside kernel: (probe6:ahc0:0:6:0): Unretryable Error Apr 15 12:44:33 inside kernel: (probe6:ahc0:0:6:0): Down reving Protocol Version from 4 to 2? Apr 15 12:44:33 inside kernel: (probe0:ahc0:0:0:0): Down reving Protocol Version from 4 to 2? Apr 15 12:44:33 inside kernel: (probe1:ahc0:0:1:0): Down reving Protocol Version from 4 to 2? Apr 15 12:44:33 inside kernel: (ahc0:A:6:0): Sending SDTR period c, offset f Apr 15 12:44:33 inside kernel: (ahc0:A:6:0): Received SDTR period 19, offset f Apr 15 12:44:33 inside kernel: Filtered to period 19, offset f Apr 15 12:44:33 inside kernel: ahc0: target 6 synchronous at 10.0MHz, offset = 0xf Apr 15 12:44:33 inside kernel: (ahc0:A:0:0): Sending SDTR period c, offset f Apr 15 12:44:33 inside kernel: (ahc0:A:0:0): Received SDTR period c, offset f Apr 15 12:44:33 inside kernel: Filtered to period c, offset f Apr 15 12:44:33 inside kernel: ahc0: target 0 synchronous at 20.0MHz, offset = 0xf Apr 15 12:44:33 inside kernel: (ahc0:A:1:0): Sending SDTR period c, offset f Apr 15 12:44:33 inside kernel: (ahc0:A:1:0): Received SDTR period 19, offset f Apr 15 12:44:33 inside kernel: Filtered to period 19, offset f Apr 15 12:44:33 inside kernel: ahc0: target 1 synchronous at 10.0MHz, offset = 0xf Apr 15 12:44:33 inside kernel: (ahc0:A:6:0): Sending SDTR period 19, offset f Apr 15 12:44:33 inside kernel: (ahc0:A:6:0): Received SDTR period 19, offset f Apr 15 12:44:33 inside kernel: Filtered to period 19, offset f Apr 15 12:44:33 inside kernel: (probe6:ahc0:0:6:0): error 16 Apr 15 12:44:33 inside kernel: (probe6:ahc0:0:6:0): Unretryable Error Apr 15 12:44:33 inside kernel: pass0 at ahc0 bus 0 target 0 lun 0 Apr 15 12:44:33 inside kernel: pass0: Fixed Direct Access SCSI-2 device Apr 15 12:44:33 inside kernel: pass0: Serial Number 5U2V4605 Apr 15 12:44:33 inside kernel: pass0: 20.000MB/s transfers (20.000MHz, offset 15) Apr 15 12:44:33 inside kernel: pass0: Command Queueing Enabled Apr 15 12:44:33 inside kernel: pass1 at ahc0 bus 0 target 1 lun 0 Apr 15 12:44:33 inside kernel: pass1: Fixed Direct Access SCSI-2 device Apr 15 12:44:33 inside kernel: pass1: Serial Number 5U0A7413 Apr 15 12:44:33 inside kernel: pass1: 10.000MB/s transfers (10.000MHz, offset 15) Apr 15 12:44:33 inside kernel: pass1: Command Queueing Enabled Apr 15 12:44:33 inside kernel: pass6 at ahc0 bus 0 target 6 lun 0 Apr 15 12:44:33 inside kernel: pass6: Removable CD-ROM SCSI-2 device Apr 15 12:44:33 inside kernel: pass6: 10.000MB/s transfers (10.000MHz, offset 15) <--------------- XXX ----------------> Apr 15 12:44:33 inside kernel: GEOM: new disk da0 Apr 15 12:44:33 inside kernel: (ahc0:A:6:0): Sending SDTR period 19, offset f Apr 15 12:44:33 inside kernel: (ahc0:A:6:0): Received SDTR period 19, offset f Apr 15 12:44:33 inside kernel: Filtered to period 19, offset f Apr 15 12:44:33 inside kernel: (cd0:ahc0:0:6:0): Retrying Command Apr 15 12:44:33 inside kernel: da0 at ahc0 bus 0 target 0 lun 0 If I remember things correctly, the line marked XXX is where the bad kernel stopped and spit out the panic about the missing root fs. I am 99% sure that pass0 and pass1 could be found in both cases, however, everything related to da0 and da1 were missing whith the bad kernel. Unfortunately, I have no access to these machines before monday to create the verbose dmesgs. Also all 4 machines are in production but I think I can figure something out next week... Thanks, -Andre