Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Sep 2002 08:24:10 -0400 (EDT)
From:      ahd@kew.com (Drew Derbyshire)
To:        stable@freebsd.org
Subject:   vinum / 4.6.2 / mirrored drives
Message-ID:  <20020916122410.9C5B8BA14@pandora.hh.kew.com>

next in thread | raw e-mail | index | archive | help
Short version: Are there known issues with vinum used for mirroring
SCSI drives under 4.6.2?

Long version ...

I have a Dell Gx1/PII 350 with mirrored SCSI via vinum...

   FreeBSD pandora.hh.kew.com 4.6.2-RELEASE FreeBSD 4.6.2-RELEASE #16: Fri Aug 16 23:23:07 EDT 2002     ahd@pandora.hh.kew
.com:/usr/scratch/obj/usr/src/sys/DELL_GX1  i386

   ahc0: <Adaptec 29160N Ultra160 SCSI adapter> port 0xc800-0xc8ff mem 0xff000000-0xff000fff irq 11 at device 13.0 on pci0
   aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
   sa0 at ahc0 bus 0 target 4 lun 0
   sa0: <EXABYTE EXB-8505 0051> Removable Sequential Access SCSI-2 device
   sa0: 5.000MB/s transfers (5.000MHz, offset 11)
   da0 at ahc0 bus 0 target 0 lun 0
   da0: <SEAGATE ST318405LW 5063> Fixed Direct Access SCSI-3 device
   da0: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled
   da0: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C)
   da1 at ahc0 bus 0 target 1 lun 0
   da1: <SEAGATE ST318405LW 5063> Fixed Direct Access SCSI-3 device
   da1: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled
   da1: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C)

The Exabyte, BTW, is external and was inactive during what follows.
The drives are internal on an Adaptec(tm) LVD cable.

Four file systems, each with their own plexes, provide the mirroring:

   2 drives:
   D d0                    State: up       Device /dev/da0s2g      Avail: 0/16739 MB (0%)
   D d1                    State: up       Device /dev/da1s2g      Avail: 0/16739 MB (0%)

   4 volumes:
   V var                   State: up       Plexes:       2 Size:        512 MB
   V usr                   State: up       Plexes:       2 Size:       2227 MB
   V export                State: up       Plexes:       2 Size:       5000 MB
   V scratch               State: up       Plexes:       2 Size:       9000 MB

   8 plexes:
   P var.p0              C State: up       Subdisks:     1 Size:        512 MB
   P usr.p0              C State: up       Subdisks:     1 Size:       2227 MB
   P export.p0           C State: up       Subdisks:     1 Size:       5000 MB
   P scratch.p0          C State: up       Subdisks:     1 Size:       9000 MB
   P var.p1              C State: up       Subdisks:     1 Size:        512 MB
   P usr.p1              C State: up       Subdisks:     1 Size:       2227 MB
   P export.p1           C State: up       Subdisks:     1 Size:       5000 MB
   P scratch.p1          C State: up       Subdisks:     1 Size:       9000 MB

   8 subdisks:
   S var.p0.s0             State: up       PO:        0  B Size:        512 MB
   S usr.p0.s0             State: up       PO:        0  B Size:       2227 MB
   S export.p0.s0          State: up       PO:        0  B Size:       5000 MB
   S scratch.p0.s0         State: up       PO:        0  B Size:       9000 MB
   S var.p1.s0             State: up       PO:        0  B Size:        512 MB
   S usr.p1.s0             State: up       PO:        0  B Size:       2227 MB
   S export.p1.s0          State: up       PO:        0  B Size:       5000 MB
   S scratch.p1.s0         State: up       PO:        0  B Size:       9000 MB

The other partitions on the drives are also the same layout; equal
sized NT slice, and root and swap space.  da0 was the actual boot drive.

In the past week, I've started seeing the following:

   (da0:ahc0:0:0:0): Invalidating pack
   fatal :scratch.p1.s0 write error, block 16769125 for 1024 bytes
   scratch.p1.s0: user buffer block 919388 for 1024 bytes

The failure occurs at random places on the da0 in diffent subdisks,
but always da0, perhaps a few times a day.  Restarting the drives
via vinum start in single user mode after a reboot works 98% of the
time.

Once the failure happens, it also reports problems in swap space.
I've never seen the swap space go down the tubes first.

It never failed on a reboot from da0.

The drive was closest to the controller on the cable, for grins I
moved both drives to spare connectors on the cable and the problem
did not move.

(Addition SCSI state was kicked out to the console, but does not
appear in the log.  This info complains of timeouts.)

It's <expletive> intermittment, sufficiently so that I include among
my suspect causes a software timing problem.  I was able to copy
the entire drive to a ST318406LW which I swapped in, and now the
suspect drive is pulled.  Seagate diagnostics, run on a Dell P/III
with a different 29160, show no problems on the drive (not surprising,
given the intermittment nature.)

How do I track this down?  (Clearly, if the new drive never fails,
it is the old drive, but how do I prove that to Seagate?)

-ahd-

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020916122410.9C5B8BA14>