From owner-freebsd-stable  Sat Dec  7 12: 9:38 2002
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2F27937B401
	for <freebsd-stable@freebsd.org>; Sat,  7 Dec 2002 12:09:36 -0800 (PST)
Received: from gs166.sp.cs.cmu.edu (GS166.SP.CS.CMU.EDU [128.2.205.169])
	by mx1.FreeBSD.org (Postfix) with SMTP id 8AE9143E4A
	for <freebsd-stable@freebsd.org>; Sat,  7 Dec 2002 12:09:35 -0800 (PST)
	(envelope-from dpelleg@gs166.sp.cs.cmu.edu)
To: Mike Hoskins <mike@adept.org>
Cc: freebsd-stable@freebsd.org
Subject: Re: RELEASE crash - SCSI related?
References: <20021206135205.O98942-100000@fubar.adept.org>
From: Dan Pelleg <daniel+bsd@pelleg.org>
Date: 07 Dec 2002 15:09:18 -0500
In-Reply-To: <20021206135205.O98942-100000@fubar.adept.org>
Message-ID: <u2s3cp9h9qp.fsf@gs166.sp.cs.cmu.edu>
Lines: 57
User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.1 (Cuyahoga Valley)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

Mike Hoskins <mike@adept.org> writes:

> On Fri, 6 Dec 2002, Dan Pelleg wrote:
> > This NFS server would crash every now and then (once in a few weeks,
> > seems to be correlated with heavy disk activity). Auto fsck will usually
> > fail and occasionally a few gigs of data will be lost. I'm beginning to
> > suspect the disk array
> 
> What sort of disks, array, etc. are you using?
> 

ahc1: <Adaptec aic7899 Ultra160 SCSI adapter> port 0xd800-0xd8ff mem 0xfeaff000-0xfeafffff irq 10 at device 5.1 on pci0
aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
...
da2 at ahc1 bus 0 target 0 lun 0
da2: <IFT IFT-7200 0132> Fixed Direct Access SCSI-4 device 
da2: 40.000MB/s transfers (20.000MHz, offset 31, 16bit), Tagged Queueing Enabled
da2: 667743MB (1367537920 512 byte sectors: 255H 63S/T 19589C)

it's a SCSI-to-ATA controller (in this dmesg it's slowed down, it usually
runs at 160), configured at RAID-5.

I have softupdates on (also quotas, if that matters).

> > #0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:487
> > #1  0xc01c1c97 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:316
> > #2  0xc01c20bc in poweroff_wait (junk=0xc032b0c0, howto=-964112384)
> >     at /usr/src/sys/kern/kern_shutdown.c:595
> > #3  0xc0172b0c in ahc_search_qinfifo (ahc=0xc688d000, target=0, channel=65 'A', lun=0,
> >     tag=210, role=ROLE_INITIATOR, status=0, action=SEARCH_COUNT)
> >     at /usr/src/sys/dev/aic7xxx/aic7xxx.c:5378
> > #4  0xc0178c04 in ahc_timeout (arg=0xc68a45a8)
> >     at /usr/src/sys/dev/aic7xxx/aic7xxx_osm.c:1608
> > #5  0xc01c7ba5 in softclock () at /usr/src/sys/kern/kern_timeout.c:131
> > #6  0xc02fa700 in splz_swi ()
> 
> 
> This has been behaving.  Do you have a similarly configured server where
> you could try building a -STABLE snapshot?  That obviously doesn't negate
> the need to resolve this issue, but may get you up and running until a
> solution is found.
> 

Oh, I'm up and I'm running. It's just that every once in a while I'm not
"running" anymore, and if I'm unlucky, before I'm "up" again there are
a few good few hours of fsck, a filled up lost+found, and data loss.

I don't have a spare to test -STABLE against. I'm not even sure I can
reproduce the crash. As I said, I'm suspecting the array or the cabling at
this point. But while I'm talking to vendors to address both of these
non-FreeBSD issues I would like to know if there's anything at the kernel
level I could be doing. For example, am I more likely to come up cleanly if
I turn softupdates off?

-- 

  Dan Pelleg

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message