Date: Fri, 25 May 2018 21:14:04 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Cc: James Rose <james.rose@framestore.com> Subject: pNFS mirror file distribution (big picture question?) Message-ID: <YTXPR0101MB09595418D7B853DBCEB9458CDD690@YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM>
next in thread | raw e-mail | index | archive | help
Hi, #1 The code currently in projects/pnfs-planb-server allows creation of sets= of mirrored data servers (DSs). For example, the "-p" nfsd option argument: nfsv4-data0#nfsv4-data1,nfsv4-data2#nfsv4-data3 defines two mirrored sets of data servers with two servers in each one. ("#" separates mirrors within a mirror set) I did this a couple of years ago, in part because I thought having a well d= efined "mirror" for a DS would facilitate mirror recovery. Now that I have completed the mirror recovery code, having a defined mirror set is not needed. #2 An alternate mirroring approach would be what I might call the random/di= stributed approach, where each file is distributed on any two (or more) of the DSs. For this approach, the "-p" nfsd option argument: nfsv4-data0,nfsv4-data1,nfsv4-data2,nfsv4-data3 defines four DSs and a separate flag would say "two way mirroring", so each file would be placed on 2 of the 4 DSs. The question is "should I switch the code to approach #2?". I won't call it elegant, but #1 is neat and tidy, since the sysadmin knows = that a data file ends up on either nfsv4-data0, nfsv4-data1 or nfsv4-data2, nfsv= 4-data3. Assuming the mirrored DSs in a set have the same amount of storage, they w= ill have the same amount of free space. --> This implies that they will run out of space at the same time and the p= NFS service won't be able to write to files on the mirror set. With #2, one of the DSs will probably run out of space first. I think this = will make a client trying to write a file on it to report to the Metadata server (MDS= ) a write error and that will cause the DS to be taken offline. Then, the write will succeed on the other mirror and things will continue t= o run. Eventually all the DSs will fill up, but hopefully a sysadmin can step in a= nd fix the "out of space" problem before that point. Another advantage I can see for #2 is that it gives the MDS more flexibilit= y when it chooses which DSs to create the data files on than #1 does. (I will be less "neat and tidy", but the sysadmin can find out which DSs st= ore the data for a file on the MDS on a "per file" basis.) James Rose was asking about "manual migration". It is almost the same as wh= at is already done for mirror recovery and is a pretty trivial addition for #2= . For #1, it can be done, but is more work. "manual migration" refers to a sysadmin doin= g a command that moves a data file from one DS to another. (Others that are more clever than I could use the "manual migration" syscal= l to implement automagic versions to try and balance storage use and I/O loa= d.) Given easier migration and what I think is better handling of "out of space= " failures, I am leaning towards switching the code to #2. What do others think? rick=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTXPR0101MB09595418D7B853DBCEB9458CDD690>