Date: Mon, 16 Sep 2002 15:49:35 -0700 From: "Jamie Heckford" <jamie@jamiesdomain.org.uk> To: <stable@FreeBSD.ORG>, "Drew Derbyshire" <ahd@kew.com> Subject: Re: vinum / 4.6.2 / mirrored drives Message-ID: <001201c25dd3$56dc0e60$3764a8c0@BONG> References: <20020916122410.9C5B8BA14@pandora.hh.kew.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, I've had quite a few problems _similar_ to this in the past, most have been solved by replacing the SCSI cable and double checking the termination believe it or not! There is most likely a proper explanation for your problem, but it couldn't hurt checking its all ok. Cheers Jamie ----- Original Message ----- From: "Drew Derbyshire" <ahd@kew.com> To: <stable@FreeBSD.ORG> Sent: Monday, September 16, 2002 5:24 AM Subject: vinum / 4.6.2 / mirrored drives > Short version: Are there known issues with vinum used for mirroring > SCSI drives under 4.6.2? > > Long version ... > > I have a Dell Gx1/PII 350 with mirrored SCSI via vinum... > > FreeBSD pandora.hh.kew.com 4.6.2-RELEASE FreeBSD 4.6.2-RELEASE #16: Fri Aug 16 23:23:07 EDT 2002 ahd@pandora.hh.kew > .com:/usr/scratch/obj/usr/src/sys/DELL_GX1 i386 > > ahc0: <Adaptec 29160N Ultra160 SCSI adapter> port 0xc800-0xc8ff mem 0xff000000-0xff000fff irq 11 at device 13.0 on pci0 > aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs > sa0 at ahc0 bus 0 target 4 lun 0 > sa0: <EXABYTE EXB-8505 0051> Removable Sequential Access SCSI-2 device > sa0: 5.000MB/s transfers (5.000MHz, offset 11) > da0 at ahc0 bus 0 target 0 lun 0 > da0: <SEAGATE ST318405LW 5063> Fixed Direct Access SCSI-3 device > da0: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled > da0: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) > da1 at ahc0 bus 0 target 1 lun 0 > da1: <SEAGATE ST318405LW 5063> Fixed Direct Access SCSI-3 device > da1: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled > da1: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) > > The Exabyte, BTW, is external and was inactive during what follows. > The drives are internal on an Adaptec(tm) LVD cable. > > Four file systems, each with their own plexes, provide the mirroring: > > 2 drives: > D d0 State: up Device /dev/da0s2g Avail: 0/16739 MB (0%) > D d1 State: up Device /dev/da1s2g Avail: 0/16739 MB (0%) > > 4 volumes: > V var State: up Plexes: 2 Size: 512 MB > V usr State: up Plexes: 2 Size: 2227 MB > V export State: up Plexes: 2 Size: 5000 MB > V scratch State: up Plexes: 2 Size: 9000 MB > > 8 plexes: > P var.p0 C State: up Subdisks: 1 Size: 512 MB > P usr.p0 C State: up Subdisks: 1 Size: 2227 MB > P export.p0 C State: up Subdisks: 1 Size: 5000 MB > P scratch.p0 C State: up Subdisks: 1 Size: 9000 MB > P var.p1 C State: up Subdisks: 1 Size: 512 MB > P usr.p1 C State: up Subdisks: 1 Size: 2227 MB > P export.p1 C State: up Subdisks: 1 Size: 5000 MB > P scratch.p1 C State: up Subdisks: 1 Size: 9000 MB > > 8 subdisks: > S var.p0.s0 State: up PO: 0 B Size: 512 MB > S usr.p0.s0 State: up PO: 0 B Size: 2227 MB > S export.p0.s0 State: up PO: 0 B Size: 5000 MB > S scratch.p0.s0 State: up PO: 0 B Size: 9000 MB > S var.p1.s0 State: up PO: 0 B Size: 512 MB > S usr.p1.s0 State: up PO: 0 B Size: 2227 MB > S export.p1.s0 State: up PO: 0 B Size: 5000 MB > S scratch.p1.s0 State: up PO: 0 B Size: 9000 MB > > The other partitions on the drives are also the same layout; equal > sized NT slice, and root and swap space. da0 was the actual boot drive. > > In the past week, I've started seeing the following: > > (da0:ahc0:0:0:0): Invalidating pack > fatal :scratch.p1.s0 write error, block 16769125 for 1024 bytes > scratch.p1.s0: user buffer block 919388 for 1024 bytes > > The failure occurs at random places on the da0 in diffent subdisks, > but always da0, perhaps a few times a day. Restarting the drives > via vinum start in single user mode after a reboot works 98% of the > time. > > Once the failure happens, it also reports problems in swap space. > I've never seen the swap space go down the tubes first. > > It never failed on a reboot from da0. > > The drive was closest to the controller on the cable, for grins I > moved both drives to spare connectors on the cable and the problem > did not move. > > (Addition SCSI state was kicked out to the console, but does not > appear in the log. This info complains of timeouts.) > > It's <expletive> intermittment, sufficiently so that I include among > my suspect causes a software timing problem. I was able to copy > the entire drive to a ST318406LW which I swapped in, and now the > suspect drive is pulled. Seagate diagnostics, run on a Dell P/III > with a different 29160, show no problems on the drive (not surprising, > given the intermittment nature.) > > How do I track this down? (Clearly, if the new drive never fails, > it is the old drive, but how do I prove that to Seagate?) > > -ahd- > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > -- ____________________________________________________ Message scanned for viruses and dangerous content by <http://www.newnet.co.uk/av/> and believed to be clean To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?001201c25dd3$56dc0e60$3764a8c0>
