From owner-freebsd-stable Sat Dec 7 12: 9:38 2002 Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2F27937B401 for ; Sat, 7 Dec 2002 12:09:36 -0800 (PST) Received: from gs166.sp.cs.cmu.edu (GS166.SP.CS.CMU.EDU [128.2.205.169]) by mx1.FreeBSD.org (Postfix) with SMTP id 8AE9143E4A for ; Sat, 7 Dec 2002 12:09:35 -0800 (PST) (envelope-from dpelleg@gs166.sp.cs.cmu.edu) To: Mike Hoskins Cc: freebsd-stable@freebsd.org Subject: Re: RELEASE crash - SCSI related? References: <20021206135205.O98942-100000@fubar.adept.org> From: Dan Pelleg Date: 07 Dec 2002 15:09:18 -0500 In-Reply-To: <20021206135205.O98942-100000@fubar.adept.org> Message-ID: Lines: 57 User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.1 (Cuyahoga Valley) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Mike Hoskins writes: > On Fri, 6 Dec 2002, Dan Pelleg wrote: > > This NFS server would crash every now and then (once in a few weeks, > > seems to be correlated with heavy disk activity). Auto fsck will usually > > fail and occasionally a few gigs of data will be lost. I'm beginning to > > suspect the disk array > > What sort of disks, array, etc. are you using? > ahc1: port 0xd800-0xd8ff mem 0xfeaff000-0xfeafffff irq 10 at device 5.1 on pci0 aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs ... da2 at ahc1 bus 0 target 0 lun 0 da2: Fixed Direct Access SCSI-4 device da2: 40.000MB/s transfers (20.000MHz, offset 31, 16bit), Tagged Queueing Enabled da2: 667743MB (1367537920 512 byte sectors: 255H 63S/T 19589C) it's a SCSI-to-ATA controller (in this dmesg it's slowed down, it usually runs at 160), configured at RAID-5. I have softupdates on (also quotas, if that matters). > > #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:487 > > #1 0xc01c1c97 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:316 > > #2 0xc01c20bc in poweroff_wait (junk=0xc032b0c0, howto=-964112384) > > at /usr/src/sys/kern/kern_shutdown.c:595 > > #3 0xc0172b0c in ahc_search_qinfifo (ahc=0xc688d000, target=0, channel=65 'A', lun=0, > > tag=210, role=ROLE_INITIATOR, status=0, action=SEARCH_COUNT) > > at /usr/src/sys/dev/aic7xxx/aic7xxx.c:5378 > > #4 0xc0178c04 in ahc_timeout (arg=0xc68a45a8) > > at /usr/src/sys/dev/aic7xxx/aic7xxx_osm.c:1608 > > #5 0xc01c7ba5 in softclock () at /usr/src/sys/kern/kern_timeout.c:131 > > #6 0xc02fa700 in splz_swi () > > > This has been behaving. Do you have a similarly configured server where > you could try building a -STABLE snapshot? That obviously doesn't negate > the need to resolve this issue, but may get you up and running until a > solution is found. > Oh, I'm up and I'm running. It's just that every once in a while I'm not "running" anymore, and if I'm unlucky, before I'm "up" again there are a few good few hours of fsck, a filled up lost+found, and data loss. I don't have a spare to test -STABLE against. I'm not even sure I can reproduce the crash. As I said, I'm suspecting the array or the cabling at this point. But while I'm talking to vendors to address both of these non-FreeBSD issues I would like to know if there's anything at the kernel level I could be doing. For example, am I more likely to come up cleanly if I turn softupdates off? -- Dan Pelleg To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message