Date: Mon, 12 Feb 2001 00:28:04 -0600 From: David Schooley <dcschooley@ieee.org> To: freebsd-questions@FreeBSD.ORG Subject: Vinum behavior (long) Message-ID: <p04320401b6ad1e740361@[192.168.1.100]>
next in thread | raw e-mail | index | archive | help
I have been doing some experimenting with vinum, primarily to
understand it before putting it to regular use. I have a few
questions, primarily due to oddities I can't explain.
The setup consists 4 identical 30GB ATA drives, each on its own
channel. One pair of channels is comes off of the motherboard
controller; the other pair hangs off of a PCI card. I am running
4.2-STABLE, cvsup'ed some time within the past week.
The configuration file I am using is as follows and is fairly close
to the examples in the man page and elsewhere, although it raises
some questions by itself. What I attempted to do was make sure each
drive was mirrored to the corresponding drive on the other
controller, i.e., 1<->3, and 2->4:
***
drive drive1 device /dev/ad0s1d
drive drive2 device /dev/ad2s1d
drive drive3 device /dev/ad4s1d
drive drive4 device /dev/ad6s1d
volume raid setupstate
plex org striped 300k
sd length 14655m drive drive1
sd length 14655m drive drive2
sd length 14655m drive drive3
sd length 14655m drive drive4
plex org striped 300k
sd length 14655m drive drive3
sd length 14655m drive drive4
sd length 14655m drive drive1
sd length 14655m drive drive2
***
I wanted to see what would happen if I lost an entire IDE controller,
so I set everything up, mounted the new volume and copied over
everything from /usr/local. I shut the machine down, cut the power to
drives 3 and 4, and restarted. Upon restart, vinum reported that
drives 3 and 4 had failed. If my understanding is correct, then I
should have been OK since any data on drives 3 and 4 would have been
a copy of what was on drives 1 and 2, respectively.
For the next part of the test, I attempted to duplicate a directory
in the raid version of /usr/local. It partially worked, but there
there were errors during the copy and only about two thirds of the
data was successfully copied.
Question #1: Shouldn't this have worked?
After I "fixed" the "broken" controller and restarted the machine,
vinum's list looked like this:
***
4 drives:
D drive1 State: up Device /dev/ad0s1d
Avail: 1/29311 MB (0%)
D drive2 State: up Device /dev/ad2s1d
Avail: 1/29311 MB (0%)
D drive3 State: up Device /dev/ad4s1d
Avail: 1/29311 MB (0%)
D drive4 State: up Device /dev/ad6s1d
Avail: 1/29311 MB (0%)
1 volumes:
V raid State: up Plexes: 2 Size: 57 GB
2 plexes:
P raid.p0 S State: corrupt Subdisks: 4 Size: 57 GB
P raid.p1 S State: corrupt Subdisks: 4 Size: 57 GB
8 subdisks:
S raid.p0.s0 State: up PO: 0 B Size: 14 GB
S raid.p0.s1 State: up PO: 300 kB Size: 14 GB
S raid.p0.s2 State: stale PO: 600 kB Size: 14 GB
S raid.p0.s3 State: stale PO: 900 kB Size: 14 GB
S raid.p1.s0 State: stale PO: 0 B Size: 14 GB
S raid.p1.s1 State: stale PO: 300 kB Size: 14 GB
S raid.p1.s2 State: up PO: 600 kB Size: 14 GB
S raid.p1.s3 State: up PO: 900 kB Size: 14 GB
***
This makes sense. Now after restarting raid.p0 and waiting for
everything to resync, I got this:
***
2 plexes:
P raid.p0 S State: up Subdisks: 4 Size: 57 GB
P raid.p1 S State: corrupt Subdisks: 4 Size: 57 GB
8 subdisks:
S raid.p0.s0 State: up PO: 0 B Size: 14 GB
S raid.p0.s1 State: up PO: 300 kB Size: 14 GB
S raid.p0.s2 State: up PO: 600 kB Size: 14 GB
S raid.p0.s3 State: up PO: 900 kB Size: 14 GB
S raid.p1.s0 State: stale PO: 0 B Size:
14 GB <--- still stale
S raid.p1.s1 State: stale PO: 300 kB Size:
14 GB <--- still stale
S raid.p1.s2 State: up PO: 600 kB Size: 14 GB
S raid.p1.s3 State: up PO: 900 kB Size: 14 GB
***
Now the only place that raid.p0.s2 and raid.p0.s3 could have gotten
their data is from raid.p1.s0 and raid.p1.s1, neither of which were
involved in the "event".
Question #2: Since the data on raid.p0 now matches raid.p1,
shouldn't raid.p1 have come up automatically and without having to
copy data from raid.p0?
The configuration file below makes sense, but suffers a slight
performance penalty over the first one.
Question #3: Is there a reason why "mirror -s" does it this way
instead of striping to all 4 disks?
I kind of prefer it this way, but I'm still curious.
***
drive drive1 device /dev/ad0s1d
drive drive2 device /dev/ad2s1d
drive drive3 device /dev/ad4s1d
drive drive4 device /dev/ad6s1d
volume raid setupstate
plex org striped 300k
sd length 29310 m drive drive1
sd length 29310 m drive drive2
plex org striped 300k
sd length 29310 m drive drive3
sd length 29310 m drive drive4
***
While reading through the archives, I noticed several occasions where
it was stated that a power-of-two stripe size was potentially bad
because all of the superblocks could end up on the same disk, thereby
impacting performance, but the documentation and "mirror -s" all use
a stripe size of 256k.
Question 4: Is the power-of-two concern still valid, and if so,
shouldn't the documentation and "mirror -s" function be changed?
Thanks,
David.
--
---------------------------------------------------
David C. Schooley, Ph.D.
Transmission Operations/Technical Operations Support
Commonwealth Edison Company
work phone: 630-691-4466/(472)-4466
work email: mailto:david.c.schooley@ucm.com
home email: mailto:dcschooley@ieee.org
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?p04320401b6ad1e740361>
