From owner-freebsd-scsi Thu Oct 12 7:34:30 2000 Delivered-To: freebsd-scsi@freebsd.org Received: from verdi.nethelp.no (verdi.nethelp.no [158.36.41.162]) by hub.freebsd.org (Postfix) with SMTP id ABFA837B502 for ; Thu, 12 Oct 2000 07:34:26 -0700 (PDT) Received: (qmail 54204 invoked by uid 1001); 12 Oct 2000 14:34:24 +0000 (GMT) To: gibbs@scsiguy.com Cc: freebsd-scsi@FreeBSD.ORG Subject: Re: Stressed SCSI subsystem locks up the system From: sthaug@nethelp.no In-Reply-To: Your message of "Wed, 11 Oct 2000 05:27:31 +0000" References: <200010110527.e9B5RV603276@aslan.scsiguy.com> X-Mailer: Mew version 1.05+ on Emacs 19.34.2 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Date: Thu, 12 Oct 2000 16:34:24 +0200 Message-ID: <54202.971361264@verdi.nethelp.no> Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > As always, I am interested in knowing the details of this problem and > would like to resolve it. The easiest way to do this is to switch > over to using 4.1-stable built from source so I can work directly with > the site to debug the problem. We have a similar problem (may not be the same). We have a mail server with the following SCSI configuration: ahc0: port 0xe800-0xe8ff mem 0xfebff000-0xfebfffff irq 10 at device 14.0 on pci0 aic7890/91: Wide Channel A, SCSI Id=7, 32/255 SCBs sa0 at ahc0 bus 0 target 2 lun 0 sa0: Removable Sequential Access SCSI-2 device sa0: 7.812MB/s transfers (7.812MHz, offset 15) da0 at ahc0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-2 device da0: 80.000MB/s transfers (40.000MHz, offset 15, 16bit) da0: 8683MB (17783240 512 byte sectors: 255H 63S/T 1106C) da1 at ahc0 bus 0 target 6 lun 0 da1: Fixed Direct Access SCSI-2 device da1: 80.000MB/s transfers (40.000MHz, offset 15, 16bit) da1: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C) This server has been extremely stable with 4.1-STABLE and earlier. With 4.1.1-STABLE we have had two cases of the system crashing with "page fault while in kernel mode" - and then it hangs while trying to sync the disks (but still responds to ping!). The instruction pointer that is printed is 0xc0135167 (same in both cases), which is inside ahc_action(): c0134ca8 T ahc_done c0134f78 t ahc_action c01358bc t ahc_get_tran_settings Specifically, line 441 in ahc_action, from $FreeBSD: src/sys/dev/aic7xxx/aic7xxx_freebsd.c,v 1.3.2.1 2000/09/23 00:24:03 gibbs Exp $ 436 if ((scb = ahc_get_scb(ahc)) == NULL) { 437 438 ahc_lock(ahc, &s); 439 ahc->flags |= AHC_RESOURCE_SHORTAGE; 440 ahc_unlock(ahc, &s); 441 xpt_freeze_simq(sim, /*count*/1); 442 ahc_set_transaction_status(scb, CAM_REQUEUE_REQ); 443 xpt_done(ccb); 444 return; Line 441 of "../../dev/aic7xxx/aic7xxx_freebsd.c" starts at address 0xc0135159 and ends at 0xc0135175 . At the moment I'm tempted to simply revert to the 4.1-STABLE code on this host. It looks like the differences between 4.1-STABLE and 4.1.1-STABLE are rather large - aic7xxx_freebsd.c doesn't exist in 4.1-STABLE, ahc_action is in aic7xxx.c instead: $FreeBSD: src/sys/dev/aic7xxx/aic7xxx.c,v 1.41.2.1 2000/03/18 23:00:11 gibbs Exp $ Any suggestions before I revert to 4.1-STABLE? Steinar Haug, Nethelp consulting, sthaug@nethelp.no To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message