From owner-freebsd-scsi@FreeBSD.ORG  Mon Apr 25 11:07:08 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 02800106566B
	for <freebsd-scsi@FreeBSD.org>; Mon, 25 Apr 2011 11:07:08 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id DB4C78FC21
	for <freebsd-scsi@FreeBSD.org>; Mon, 25 Apr 2011 11:07:07 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p3PB779R084683
	for <freebsd-scsi@FreeBSD.org>; Mon, 25 Apr 2011 11:07:07 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p3PB77qx084680
	for freebsd-scsi@FreeBSD.org; Mon, 25 Apr 2011 11:07:07 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 25 Apr 2011 11:07:07 GMT
Message-Id: <201104251107.p3PB77qx084680@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-scsi@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Apr 2011 11:07:08 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/154432  scsi       [xpt] run_interrupt_driven_hooks: still waiting after 
o kern/153361  scsi       [ciss] Smart Array 5300 boot/detect drive problem
o kern/152250  scsi       [ciss] [patch] Kernel panic when hw.ciss.expose_hidden
o kern/151564  scsi       [ciss] ciss(4) should increase  CISS_MAX_LOGICAL to 10
o docs/151336  scsi       Missing documentation of scsi_ and ata_ functions in c
s kern/149927  scsi       [cam] hard drive not stopped before removing power dur
o kern/148083  scsi       [aac] Strange device reporting
o kern/147704  scsi       [mpt] sys/dev/mpt: new chip revision, partially unsupp
o kern/146287  scsi       [ciss] ciss(4) cannot see more than one SmartArray con
o kern/145768  scsi       [mpt] can't perform I/O on SAS based SAN disk in freeb
o kern/144648  scsi       [aac] Strange values of speed and bus width in dmesg
o kern/144301  scsi       [ciss] [hang] HP proliant server locks when using ciss
o kern/142351  scsi       [mpt] LSILogic driver performance problems
o kern/141934  scsi       [cam] [patch] add support for SEAGATE DAT Scopion 130
o kern/134488  scsi       [mpt] MPT SCSI driver probes max. 8 LUNs per device
o kern/132250  scsi       [ciss] ciss driver does not support more then 15 drive
o kern/132206  scsi       [mpt] system panics on boot when mirroring and 2nd dri
o kern/130621  scsi       [mpt] tranfer rate is inscrutable slow when use lsi213
o kern/129602  scsi       [ahd] ahd(4) gets confused and wedges SCSI bus
o kern/128452  scsi       [sa] [panic] Accessing SCSI tape drive randomly crashe
o kern/128245  scsi       [scsi] "inquiry data fails comparison at DV1 step" [re
o kern/127927  scsi       [isp] isp(4) target driver crashes kernel when set up 
o kern/127717  scsi       [ata] [patch] [request] - support write cache toggling
o kern/124667  scsi       [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi
o kern/123674  scsi       [ahc] ahc driver dumping
o kern/123520  scsi       [ahd] unable to boot from net while using ahd
o sparc/121676 scsi       [iscsi] iscontrol do not connect iscsi-target on sparc
o kern/120487  scsi       [sg] scsi_sg incompatible with scanners
o kern/120247  scsi       [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s 
o kern/114597  scsi       [sym] System hangs at SCSI bus reset with dual HBAs
o kern/110847  scsi       [ahd] Tyan U320 onboard problem with more than 3 disks
o kern/99954   scsi       [ahc] reading from DVD failes on 6.x [regression]
o kern/92798   scsi       [ahc] SCSI problem with timeouts
o kern/90282   scsi       [sym] SCSI bus resets cause loss of ch device
o kern/76178   scsi       [ahd] Problem with ahd and large SCSI Raid system
o kern/74627   scsi       [ahc] [hang] Adaptec 2940U2W Can't boot 5.3
s kern/61165   scsi       [panic] kernel page fault after calling cam_send_ccb
o kern/60641   scsi       [sym] Sporadic SCSI bus resets with 53C810 under load
o kern/60598   scsi       wire down of scsi devices conflicts with config
s kern/57398   scsi       [mly] Current fails to install on mly(4) based RAID di
o bin/57088    scsi       [cam] [patch] for a possible fd leak in libcam.c
o kern/52638   scsi       [panic] SCSI U320 on SMP server won't run faster than 
o kern/44587   scsi       dev/dpt/dpt.h is missing defines required for DPT_HAND
o kern/39388   scsi       ncr/sym drivers fail with 53c810 and more than 256MB m
o kern/35234   scsi       World access to /dev/pass? (for scanner) requires acce

45 problems total.


From owner-freebsd-scsi@FreeBSD.ORG  Mon Apr 25 21:27:56 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B0C961065781;
	Mon, 25 Apr 2011 21:27:56 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 6E5F48FC16;
	Mon, 25 Apr 2011 21:27:56 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 0F64E46B9C;
	Mon, 25 Apr 2011 17:27:56 -0400 (EDT)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 999038A027;
	Mon, 25 Apr 2011 17:27:55 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Andre Albsmeier <Andre.Albsmeier@siemens.com>
Date: Mon, 25 Apr 2011 17:03:18 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; )
References: <201102041444.p14EixJP087709@svn.freebsd.org>
	<201104190920.25924.jhb@freebsd.org>
	<20110420053226.GA22854@curry.mchp.siemens.de>
In-Reply-To: <20110420053226.GA22854@curry.mchp.siemens.de>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201104251703.18518.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6
	(bigwig.baldwin.cx); Mon, 25 Apr 2011 17:27:55 -0400 (EDT)
Cc: "svn-src-stable-7@freebsd.org" <svn-src-stable-7@freebsd.org>,
	"scsi@freebsd.org" <scsi@freebsd.org>
Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Apr 2011 21:27:56 -0000

On Wednesday, April 20, 2011 1:32:26 am Andre Albsmeier wrote:
> On Tue, 19-Apr-2011 at 15:20:25 +0200, John Baldwin wrote:
> > On Tuesday, April 19, 2011 8:56:48 am Andre Albsmeier wrote:
> > > On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote:
> > > > On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote:
> > > > > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote:
> > > > > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote:
> > > > > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote:
> > > > > > > > Author: jhb
> > > > > > > > Date: Fri Feb  4 14:44:59 2011
> > > > > > > > New Revision: 218277
> > > > > > > > URL: http://svn.freebsd.org/changeset/base/218277
> > > > > > > > 
> > > > > > > > Log:
> > > > > > > >   MFC 217075:
> > > > > > > >   Retire PCONFIG and leave the priority of thread0 alone when 
waiting for
> > > > > > > >   interrupt config hooks to execute.
> > > > > > > >   
> > > > > > > >   To preserve the KBI, I did not renumber priorities but 
simply removed
> > > > > > > >   PCONFIG.
> > > > > > > > 
> > > > > > > > Modified:
> > > > > > > >   stable/7/sys/kern/subr_autoconf.c
> > > > > > > >   stable/7/sys/sys/priority.h
> > > > > > > > Directory Properties:
> > > > > > > >   stable/7/sys/   (props changed)
> > > > > > > >   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
> > > > > > > >   stable/7/sys/contrib/dev/acpica/   (props changed)
> > > > > > > >   stable/7/sys/contrib/pf/   (props changed)
> > > > > > > > 
> > > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c
> > > > > > > > 
> > > > > > 
==============================================================================
> > > > > > > > --- stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:42 
2011	
> > > > > > (r218276)
> > > > > > > > +++ stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:59 
2011	
> > > > > > (r218277)
> > > > > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy)
> > > > > > > >  	warned = 0;
> > > > > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > > > > >  		if (msleep(&intr_config_hook_list, 
&intr_config_hook_lock,
> > > > > > > > -		    PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > > > +		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > > >  		    EWOULDBLOCK) {
> > > > > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > > > > >  			warned++;
> > > > > > > 
> > > > > > > 
> > > > > > > This broke several of my machines in a somewhat strange way:
> > > > > > > 
> > > > > > > After upgrading them (17) to a recent 7-STABLE (as of 
2011-04-12)
> > > > > > > I noticed that some (4) of them didn't start. All 4 didn't find
> > > > > > > their boot device anymore. What they all got in common is:
> > > > > > > 
> > > > > > > - an Adaptec 2940 Ultra SCSI adapter
> > > > > > > - two SCSI harddisks (da0 and da1) of various brands
> > > > > > > - one SCSI CDROM drive (cd0)
> > > > > > > 
> > > > > > > To be exact, none of the three devices (da0, da1, cd0) were
> > > > > > > detected at all. Other machines with a similar configuration
> > > > > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have
> > > > > > > any problems. So I simply removed the CDROM drives on the 4
> > > > > > > machines in question and they all booted again.
> > > > > > > 
> > > > > > > Today I decided to dig into this and after reverting(*) the
> > > > > > > above change, they worked with the CDROM again. I have cross-
> > > > > > > checked it 3 times. No idea what's happening here...
> > > > > > > 
> > > > > > > 	-Andre
> > > > > > > 
> > > > > > > (*) To be honest, I use this patch so I had to modify only one 
file:
> > > > > > > 
> > > > > > > --- sys/kern/subr_autoconf.c.ORI	2011-02-05 13:14:11.000000000 
+0100
> > > > > > > +++ sys/kern/subr_autoconf.c	2011-04-15 14:34:31.000000000 
+0200
> > > > > > > @@ -108,7 +108,7 @@
> > > > > > >  	warned = 0;
> > > > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > > > > -		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > > +		    PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * 
hz) ==
> > > > > > >  		    EWOULDBLOCK) {
> > > > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > > > >  			warned++;
> > > > > > 
> > > > > > Do you get any warnings about CAM timeouts, etc. when these probe?  
A verbose 
> > > > > > dmesg might be nice to look at if possible.
> > > > > 
> > > > > OK, I have set up a machine for testing. In my other mail
> > > > > I was wrong saying that the pass devices appear when using
> > > > > the problematic kernel...
> > > > > 
> > > > > Here are the dmesgs:
> > > > > 
> > > > > - dmesg_bad is the original kernel as of Friday
> > > > > - dmesg_ok is the patched kernel (see above) as of Friday
> > > > > - dmesg.diff is the diff between both
> > > > > 
> > > > > If you want me to try something just tell me...
> > > > 
> > > > Hmmm, what if you make SCSI_DELAY larger?  Also, can you let it fail 
the
> > > > mount and drop into ddb and then get 'ps' output?
> > > 
> > > As soon as I include the debugger into the kernel the problem
> > > is gone. I have double-checked it two times now: With debugger
> > > the drives are detected, without debugger mostly (but not always)
> > > not.
> > > 
> > > I currently have it running in an endless rebooting loop hoping,
> > > that it fails eventually...
> > 
> > Hummm.  This seems like it is a timing related race. :(
> 
> Success! Sometimes at night it finally panic'ed even with the
> debugger in the kernel. Here is the output of 'ps' and some other
> commands I remembered (no idea if any of these make sense in this
> context :-)). It is still in this state with the serial console
> attached so just tell me what to type ;-).
> 
> 
> KDB: enter: manual escape to debugger
> [thread pid 1 tid 100001 ]
> Stopped at      kdb_enter_why+0x3b:     xorl    %eax,%eax
> db> ps
>   pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
>    35     0     0     0  RL                          [softdepflush]
>    34     0     0     0  RL                          [syncer]
>    33     0     0     0  RL                          [vnlru]
>    32     0     0     0  RL                          [bufdaemon]
>    31     0     0     0  RL                          [pagezero]
>    30     0     0     0  RL                          [idlepoll]
>    29     0     0     0  RL                          [vmdaemon]
>    28     0     0     0  RL                          [pagedaemon]
>    27     0     0     0  WL                          [irq1: atkbd0]
>    26     0     0     0  WL                          [swi0: uart uart]
>    25     0     0     0  SL      -        0xc182a63c [fdc0]
>    24     0     0     0  SL      idle     0xc1829600 [aic_recovery0]
>    23     0     0     0  WL                          [irq11: ahc0]
>    22     0     0     0  SL      idle     0xc1829600 [aic_recovery0]
>    21     0     0     0  WL                          [irq10: fxp0]
>    20     0     0     0  WL                          [irq9: acpi0 intsmb0]
>    19     0     0     0  SL      -        0xc181b800 [kqueue taskq]
>    18     0     0     0  WL                          [swi6: task queue]
>    17     0     0     0  WL                          [swi6: Giant taskq]
> --More--            9     0     0     0  RL                          [thread 
taskq]
>    16     0     0     0  WL                          [swi5: Fast task queue]
>    15     0     0     0  WL                          [swi2: cambio]
>     8     0     0     0  SL      ccb_scan 0xc0766714 [xpt_thrd]
>     7     0     0     0  SL      -        0xc181bd80 [acpi_task_2]
>     6     0     0     0  SL      -        0xc181bd80 [acpi_task_1]
>     5     0     0     0  SL      -        0xc181bd80 [acpi_task_0]
>    14     0     0     0  SL      -        0xc077be54 [yarrow]
>     4     0     0     0  SL      -        0xc077942c [g_down]
>     3     0     0     0  SL      -        0xc0779428 [g_up]
>     2     0     0     0  SL      -        0xc0779420 [g_event]
>    13     0     0     0  WL                          [swi3: vm]
>    12     0     0     0  LL     *Giant    0xc1821dc0 [swi4: clock]
>    11     0     0     0  WL                          [swi1: net]
>    10     0     0     0  RL                          [idle]
>     1     0     0     0  RL      CPU 0               [swapper]
>     0     0     0     0  SLs     sched    0xc07794c0 [swapper]

Hummm, I don't see any "important" threads in a runnable state that I
would think were denied the ability to run by the priority change.  I wonder
what callout swi4 is trying to run (or what other callouts might be
stuck waiting for Giant).  I'd be tempted to add some KTR traces or
debugging printfs to try to figure out what sequence of events is
happening when (e.g., when the xpt thread kicks off each scan CCB, when 
xpt_rescan_done() is called, and probably some events in the ahc driver).

-- 
John Baldwin

From owner-freebsd-scsi@FreeBSD.ORG  Tue Apr 26 05:52:06 2011
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 646F2106566B;
	Tue, 26 Apr 2011 05:52:06 +0000 (UTC)
	(envelope-from Andre.Albsmeier@siemens.com)
Received: from thoth.sbs.de (thoth.sbs.de [192.35.17.2])
	by mx1.freebsd.org (Postfix) with ESMTP id D8A128FC08;
	Tue, 26 Apr 2011 05:52:05 +0000 (UTC)
Received: from mail2.siemens.de (localhost [127.0.0.1])
	by thoth.sbs.de (8.13.6/8.13.6) with ESMTP id p3Q5q4L0018787;
	Tue, 26 Apr 2011 07:52:04 +0200
Received: from curry.mchp.siemens.de (curry.mchp.siemens.de [139.25.40.130])
	by mail2.siemens.de (8.13.6/8.13.6) with ESMTP id p3Q5q4Yo005703;
	Tue, 26 Apr 2011 07:52:04 +0200
Received: (from localhost)
	by curry.mchp.siemens.de (8.14.4/8.14.4) id p3Q5q4Qs039552;
Date: Tue, 26 Apr 2011 07:52:04 +0200
From: Andre Albsmeier <Andre.Albsmeier@siemens.com>
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20110426055204.GA59975@curry.mchp.siemens.de>
References: <201102041444.p14EixJP087709@svn.freebsd.org>
	<201104190920.25924.jhb@freebsd.org>
	<20110420053226.GA22854@curry.mchp.siemens.de>
	<201104251703.18518.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201104251703.18518.jhb@freebsd.org>
X-Echelon: <censored>
X-Advice: Drop that crappy M$-Outlook, I'm tired of your viruses!
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: "svn-src-stable-7@freebsd.org" <svn-src-stable-7@freebsd.org>,
	"scsi@freebsd.org" <scsi@freebsd.org>
Subject: Re: svn commit: r218277 - in stable/7/sys: kern sys
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Apr 2011 05:52:06 -0000

On Mon, 25-Apr-2011 at 23:03:18 +0200, John Baldwin wrote:
> On Wednesday, April 20, 2011 1:32:26 am Andre Albsmeier wrote:
> > On Tue, 19-Apr-2011 at 15:20:25 +0200, John Baldwin wrote:
> > > On Tuesday, April 19, 2011 8:56:48 am Andre Albsmeier wrote:
> > > > On Mon, 18-Apr-2011 at 15:18:25 +0200, John Baldwin wrote:
> > > > > On Monday, April 18, 2011 7:36:57 am Andre Albsmeier wrote:
> > > > > > On Fri, 15-Apr-2011 at 18:35:05 +0200, John Baldwin wrote:
> > > > > > > On Friday, April 15, 2011 9:25:25 am Andre Albsmeier wrote:
> > > > > > > > On Fri, 04-Feb-2011 at 14:44:59 +0000, John Baldwin wrote:
> > > > > > > > > Author: jhb
> > > > > > > > > Date: Fri Feb  4 14:44:59 2011
> > > > > > > > > New Revision: 218277
> > > > > > > > > URL: http://svn.freebsd.org/changeset/base/218277
> > > > > > > > > 
> > > > > > > > > Log:
> > > > > > > > >   MFC 217075:
> > > > > > > > >   Retire PCONFIG and leave the priority of thread0 alone when 
> waiting for
> > > > > > > > >   interrupt config hooks to execute.
> > > > > > > > >   
> > > > > > > > >   To preserve the KBI, I did not renumber priorities but 
> simply removed
> > > > > > > > >   PCONFIG.
> > > > > > > > > 
> > > > > > > > > Modified:
> > > > > > > > >   stable/7/sys/kern/subr_autoconf.c
> > > > > > > > >   stable/7/sys/sys/priority.h
> > > > > > > > > Directory Properties:
> > > > > > > > >   stable/7/sys/   (props changed)
> > > > > > > > >   stable/7/sys/cddl/contrib/opensolaris/   (props changed)
> > > > > > > > >   stable/7/sys/contrib/dev/acpica/   (props changed)
> > > > > > > > >   stable/7/sys/contrib/pf/   (props changed)
> > > > > > > > > 
> > > > > > > > > Modified: stable/7/sys/kern/subr_autoconf.c
> > > > > > > > > 
> > > > > > > 
> ==============================================================================
> > > > > > > > > --- stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:42 
> 2011	
> > > > > > > (r218276)
> > > > > > > > > +++ stable/7/sys/kern/subr_autoconf.c	Fri Feb  4 14:44:59 
> 2011	
> > > > > > > (r218277)
> > > > > > > > > @@ -108,7 +108,7 @@ run_interrupt_driven_config_hooks(dummy)
> > > > > > > > >  	warned = 0;
> > > > > > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > > > > > >  		if (msleep(&intr_config_hook_list, 
> &intr_config_hook_lock,
> > > > > > > > > -		    PCONFIG, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > > > > +		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > > > >  		    EWOULDBLOCK) {
> > > > > > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > > > > > >  			warned++;
> > > > > > > > 
> > > > > > > > 
> > > > > > > > This broke several of my machines in a somewhat strange way:
> > > > > > > > 
> > > > > > > > After upgrading them (17) to a recent 7-STABLE (as of 
> 2011-04-12)
> > > > > > > > I noticed that some (4) of them didn't start. All 4 didn't find
> > > > > > > > their boot device anymore. What they all got in common is:
> > > > > > > > 
> > > > > > > > - an Adaptec 2940 Ultra SCSI adapter
> > > > > > > > - two SCSI harddisks (da0 and da1) of various brands
> > > > > > > > - one SCSI CDROM drive (cd0)
> > > > > > > > 
> > > > > > > > To be exact, none of the three devices (da0, da1, cd0) were
> > > > > > > > detected at all. Other machines with a similar configuration
> > > > > > > > (2940 and da0/da1) but _without_ the CDROM drive didn't have
> > > > > > > > any problems. So I simply removed the CDROM drives on the 4
> > > > > > > > machines in question and they all booted again.
> > > > > > > > 
> > > > > > > > Today I decided to dig into this and after reverting(*) the
> > > > > > > > above change, they worked with the CDROM again. I have cross-
> > > > > > > > checked it 3 times. No idea what's happening here...
> > > > > > > > 
> > > > > > > > 	-Andre
> > > > > > > > 
> > > > > > > > (*) To be honest, I use this patch so I had to modify only one 
> file:
> > > > > > > > 
> > > > > > > > --- sys/kern/subr_autoconf.c.ORI	2011-02-05 13:14:11.000000000 
> +0100
> > > > > > > > +++ sys/kern/subr_autoconf.c	2011-04-15 14:34:31.000000000 
> +0200
> > > > > > > > @@ -108,7 +108,7 @@
> > > > > > > >  	warned = 0;
> > > > > > > >  	while (!TAILQ_EMPTY(&intr_config_hook_list)) {
> > > > > > > >  		if (msleep(&intr_config_hook_list, &intr_config_hook_lock,
> > > > > > > > -		    0, "conifhk", WARNING_INTERVAL_SECS * hz) ==
> > > > > > > > +		    PRI_MIN_KERN + 32, "conifhk", WARNING_INTERVAL_SECS * 
> hz) ==
> > > > > > > >  		    EWOULDBLOCK) {
> > > > > > > >  			mtx_unlock(&intr_config_hook_lock);
> > > > > > > >  			warned++;
> > > > > > > 
> > > > > > > Do you get any warnings about CAM timeouts, etc. when these probe?  
> A verbose 
> > > > > > > dmesg might be nice to look at if possible.
> > > > > > 
> > > > > > OK, I have set up a machine for testing. In my other mail
> > > > > > I was wrong saying that the pass devices appear when using
> > > > > > the problematic kernel...
> > > > > > 
> > > > > > Here are the dmesgs:
> > > > > > 
> > > > > > - dmesg_bad is the original kernel as of Friday
> > > > > > - dmesg_ok is the patched kernel (see above) as of Friday
> > > > > > - dmesg.diff is the diff between both
> > > > > > 
> > > > > > If you want me to try something just tell me...
> > > > > 
> > > > > Hmmm, what if you make SCSI_DELAY larger?  Also, can you let it fail 
> the
> > > > > mount and drop into ddb and then get 'ps' output?
> > > > 
> > > > As soon as I include the debugger into the kernel the problem
> > > > is gone. I have double-checked it two times now: With debugger
> > > > the drives are detected, without debugger mostly (but not always)
> > > > not.
> > > > 
> > > > I currently have it running in an endless rebooting loop hoping,
> > > > that it fails eventually...
> > > 
> > > Hummm.  This seems like it is a timing related race. :(
> > 
> > Success! Sometimes at night it finally panic'ed even with the
> > debugger in the kernel. Here is the output of 'ps' and some other
> > commands I remembered (no idea if any of these make sense in this
> > context :-)). It is still in this state with the serial console
> > attached so just tell me what to type ;-).
> > 
> > 
> > KDB: enter: manual escape to debugger
> > [thread pid 1 tid 100001 ]
> > Stopped at      kdb_enter_why+0x3b:     xorl    %eax,%eax
> > db> ps
> >   pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
> >    35     0     0     0  RL                          [softdepflush]
> >    34     0     0     0  RL                          [syncer]
> >    33     0     0     0  RL                          [vnlru]
> >    32     0     0     0  RL                          [bufdaemon]
> >    31     0     0     0  RL                          [pagezero]
> >    30     0     0     0  RL                          [idlepoll]
> >    29     0     0     0  RL                          [vmdaemon]
> >    28     0     0     0  RL                          [pagedaemon]
> >    27     0     0     0  WL                          [irq1: atkbd0]
> >    26     0     0     0  WL                          [swi0: uart uart]
> >    25     0     0     0  SL      -        0xc182a63c [fdc0]
> >    24     0     0     0  SL      idle     0xc1829600 [aic_recovery0]
> >    23     0     0     0  WL                          [irq11: ahc0]
> >    22     0     0     0  SL      idle     0xc1829600 [aic_recovery0]
> >    21     0     0     0  WL                          [irq10: fxp0]
> >    20     0     0     0  WL                          [irq9: acpi0 intsmb0]
> >    19     0     0     0  SL      -        0xc181b800 [kqueue taskq]
> >    18     0     0     0  WL                          [swi6: task queue]
> >    17     0     0     0  WL                          [swi6: Giant taskq]
> > --More--            9     0     0     0  RL                          [thread 
> taskq]
> >    16     0     0     0  WL                          [swi5: Fast task queue]
> >    15     0     0     0  WL                          [swi2: cambio]
> >     8     0     0     0  SL      ccb_scan 0xc0766714 [xpt_thrd]
> >     7     0     0     0  SL      -        0xc181bd80 [acpi_task_2]
> >     6     0     0     0  SL      -        0xc181bd80 [acpi_task_1]
> >     5     0     0     0  SL      -        0xc181bd80 [acpi_task_0]
> >    14     0     0     0  SL      -        0xc077be54 [yarrow]
> >     4     0     0     0  SL      -        0xc077942c [g_down]
> >     3     0     0     0  SL      -        0xc0779428 [g_up]
> >     2     0     0     0  SL      -        0xc0779420 [g_event]
> >    13     0     0     0  WL                          [swi3: vm]
> >    12     0     0     0  LL     *Giant    0xc1821dc0 [swi4: clock]
> >    11     0     0     0  WL                          [swi1: net]
> >    10     0     0     0  RL                          [idle]
> >     1     0     0     0  RL      CPU 0               [swapper]
> >     0     0     0     0  SLs     sched    0xc07794c0 [swapper]
> 
> Hummm, I don't see any "important" threads in a runnable state that I
> would think were denied the ability to run by the priority change.  I wonder
> what callout swi4 is trying to run (or what other callouts might be
> stuck waiting for Giant).  I'd be tempted to add some KTR traces or
> debugging printfs to try to figure out what sequence of events is
> happening when (e.g., when the xpt thread kicks off each scan CCB, when 
> xpt_rescan_done() is called, and probably some events in the ahc driver).

I am happy to try whatever you suggest ;-) But I don't want to
steel too much of your time -- it seems that by reverting the
initial commit everythings runs fine for me and I can easily
live with that...

Thanks,

	-Andre

> 
> -- 
> John Baldwin

-- 
UNIX is an operating system, OS/2 is half an operating system,
Windows is a shell, and DOS is a bootsector virus.