Date: 07 Dec 2002 15:09:18 -0500 From: Dan Pelleg <daniel+bsd@pelleg.org> To: Mike Hoskins <mike@adept.org> Cc: freebsd-stable@freebsd.org Subject: Re: RELEASE crash - SCSI related? Message-ID: <u2s3cp9h9qp.fsf@gs166.sp.cs.cmu.edu> In-Reply-To: <20021206135205.O98942-100000@fubar.adept.org> References: <20021206135205.O98942-100000@fubar.adept.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Mike Hoskins <mike@adept.org> writes: > On Fri, 6 Dec 2002, Dan Pelleg wrote: > > This NFS server would crash every now and then (once in a few weeks, > > seems to be correlated with heavy disk activity). Auto fsck will usually > > fail and occasionally a few gigs of data will be lost. I'm beginning to > > suspect the disk array > > What sort of disks, array, etc. are you using? > ahc1: <Adaptec aic7899 Ultra160 SCSI adapter> port 0xd800-0xd8ff mem 0xfeaff000-0xfeafffff irq 10 at device 5.1 on pci0 aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs ... da2 at ahc1 bus 0 target 0 lun 0 da2: <IFT IFT-7200 0132> Fixed Direct Access SCSI-4 device da2: 40.000MB/s transfers (20.000MHz, offset 31, 16bit), Tagged Queueing Enabled da2: 667743MB (1367537920 512 byte sectors: 255H 63S/T 19589C) it's a SCSI-to-ATA controller (in this dmesg it's slowed down, it usually runs at 160), configured at RAID-5. I have softupdates on (also quotas, if that matters). > > #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:487 > > #1 0xc01c1c97 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:316 > > #2 0xc01c20bc in poweroff_wait (junk=0xc032b0c0, howto=-964112384) > > at /usr/src/sys/kern/kern_shutdown.c:595 > > #3 0xc0172b0c in ahc_search_qinfifo (ahc=0xc688d000, target=0, channel=65 'A', lun=0, > > tag=210, role=ROLE_INITIATOR, status=0, action=SEARCH_COUNT) > > at /usr/src/sys/dev/aic7xxx/aic7xxx.c:5378 > > #4 0xc0178c04 in ahc_timeout (arg=0xc68a45a8) > > at /usr/src/sys/dev/aic7xxx/aic7xxx_osm.c:1608 > > #5 0xc01c7ba5 in softclock () at /usr/src/sys/kern/kern_timeout.c:131 > > #6 0xc02fa700 in splz_swi () > > > This has been behaving. Do you have a similarly configured server where > you could try building a -STABLE snapshot? That obviously doesn't negate > the need to resolve this issue, but may get you up and running until a > solution is found. > Oh, I'm up and I'm running. It's just that every once in a while I'm not "running" anymore, and if I'm unlucky, before I'm "up" again there are a few good few hours of fsck, a filled up lost+found, and data loss. I don't have a spare to test -STABLE against. I'm not even sure I can reproduce the crash. As I said, I'm suspecting the array or the cabling at this point. But while I'm talking to vendors to address both of these non-FreeBSD issues I would like to know if there's anything at the kernel level I could be doing. For example, am I more likely to come up cleanly if I turn softupdates off? -- Dan Pelleg To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?u2s3cp9h9qp.fsf>