Date: Fri, 25 May 2018 21:14:04 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Cc: James Rose <james.rose@framestore.com> Subject: pNFS mirror file distribution (big picture question?) Message-ID: <YTXPR0101MB09595418D7B853DBCEB9458CDD690@YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM>
next in thread | raw e-mail | index | archive | help
Hi,
#1 The code currently in projects/pnfs-planb-server allows creation of sets=
of
mirrored data servers (DSs). For example, the "-p" nfsd option argument:
nfsv4-data0#nfsv4-data1,nfsv4-data2#nfsv4-data3
defines two mirrored sets of data servers with two servers in each one.
("#" separates mirrors within a mirror set)
I did this a couple of years ago, in part because I thought having a well d=
efined
"mirror" for a DS would facilitate mirror recovery.
Now that I have completed the mirror recovery code, having a defined mirror
set is not needed.
#2 An alternate mirroring approach would be what I might call the random/di=
stributed
approach, where each file is distributed on any two (or more) of the DSs.
For this approach, the "-p" nfsd option argument:
nfsv4-data0,nfsv4-data1,nfsv4-data2,nfsv4-data3
defines four DSs and a separate flag would say "two way mirroring", so
each file would be placed on 2 of the 4 DSs.
The question is "should I switch the code to approach #2?".
I won't call it elegant, but #1 is neat and tidy, since the sysadmin knows =
that
a data file ends up on either nfsv4-data0, nfsv4-data1 or nfsv4-data2, nfsv=
4-data3.
Assuming the mirrored DSs in a set have the same amount of storage, they w=
ill
have the same amount of free space.
--> This implies that they will run out of space at the same time and the p=
NFS
service won't be able to write to files on the mirror set.
With #2, one of the DSs will probably run out of space first. I think this =
will make
a client trying to write a file on it to report to the Metadata server (MDS=
) a write
error and that will cause the DS to be taken offline.
Then, the write will succeed on the other mirror and things will continue t=
o run.
Eventually all the DSs will fill up, but hopefully a sysadmin can step in a=
nd fix the
"out of space" problem before that point.
Another advantage I can see for #2 is that it gives the MDS more flexibilit=
y when it
chooses which DSs to create the data files on than #1 does.
(I will be less "neat and tidy", but the sysadmin can find out which DSs st=
ore the
data for a file on the MDS on a "per file" basis.)
James Rose was asking about "manual migration". It is almost the same as wh=
at
is already done for mirror recovery and is a pretty trivial addition for #2=
. For #1, it
can be done, but is more work. "manual migration" refers to a sysadmin doin=
g a
command that moves a data file from one DS to another.
(Others that are more clever than I could use the "manual migration" syscal=
l
to implement automagic versions to try and balance storage use and I/O loa=
d.)
Given easier migration and what I think is better handling of "out of space=
" failures,
I am leaning towards switching the code to #2.
What do others think? rick=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTXPR0101MB09595418D7B853DBCEB9458CDD690>
