From owner-svn-src-all@freebsd.org Sun Aug 5 20:14:08 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 82D7E105CB22; Sun, 5 Aug 2018 20:14:08 +0000 (UTC) (envelope-from rmacklem@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 376A675F0F; Sun, 5 Aug 2018 20:14:08 +0000 (UTC) (envelope-from rmacklem@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 1880C22F82; Sun, 5 Aug 2018 20:14:08 +0000 (UTC) (envelope-from rmacklem@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w75KE866044277; Sun, 5 Aug 2018 20:14:08 GMT (envelope-from rmacklem@FreeBSD.org) Received: (from rmacklem@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w75KE8Pu044276; Sun, 5 Aug 2018 20:14:08 GMT (envelope-from rmacklem@FreeBSD.org) Message-Id: <201808052014.w75KE8Pu044276@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: rmacklem set sender to rmacklem@FreeBSD.org using -f From: Rick Macklem Date: Sun, 5 Aug 2018 20:14:08 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r337360 - head/usr.sbin/nfsd X-SVN-Group: head X-SVN-Commit-Author: rmacklem X-SVN-Commit-Paths: head/usr.sbin/nfsd X-SVN-Commit-Revision: 337360 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Aug 2018 20:14:08 -0000 Author: rmacklem Date: Sun Aug 5 20:14:07 2018 New Revision: 337360 URL: https://svnweb.freebsd.org/changeset/base/337360 Log: Add a man page that describes the setup of a pNFS service. This is a content change. Added: head/usr.sbin/nfsd/pnfsserver.4 (contents, props changed) Added: head/usr.sbin/nfsd/pnfsserver.4 ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/usr.sbin/nfsd/pnfsserver.4 Sun Aug 5 20:14:07 2018 (r337360) @@ -0,0 +1,405 @@ +.\" Copyright (c) 2018 Rick Macklem +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd August 5, 2018 +.Dt PNFSSERVER 4 +.Os +.Sh NAME +.Nm pNFSserver +.Nd NFS Version 4.1 Parallel NFS Protocol Server +.Sh DESCRIPTION +A set of FreeBSD servers may be configured to provide a +.Xr pnfs 4 +service. +One FreeBSD system needs to be configured as a MetaData Server (MDS) and +at least one additional FreeBSD system needs to be configured as one or +more Data Servers (DS)s. +.Pp +These FreeBSD systems are configured to be NFSv4.1 servers, see +.Xr nfsd 8 +and +.Xr exports 5 +if you are not familiar with configuring a NFSv4.1 server. +.Sh DS server configuration +The DS(s) need to be configured as NFSv4.1 server(s), with a top level exported +directory used for storage of data files. +This directory must be owned by +.Dq root +and would normally have a mode of +.Dq 700 . +Within this directory there needs to be additional directories named +ds0,...,dsN (where N is 19 by default) also owned by +.Dq root +with mode +.Dq 700 . +These are the directories where the data files are stored. +The following command can be run by root when in the top level exported +directory to create these subdirectories. +.Bd -literal -offset indent +jot -w ds 20 0 | xargs mkdir -m 700 +.Ed +.sp +Note that +.Dq 20 +is the default and can be set to a larger value on the MDS as shown below. +.sp +The top level exported directory used for storage of data files must be +exported to the MDS with the +.Dq maproot=root sec=sys +export options so that the MDS can create entries in these subdirectories. +It must also be exported to all pNFS aware clients, but these clients do +not require the +.Dq maproot=root +export option and this directory should be exported to them with the same +options as used by the MDS to export file system(s) to the clients. +.Pp +It is possible to have multiple DSs on the same FreeBSD system, but each +of these DSs must have a separate top level exported directory used for storage +of data files and each +of these DSs must be mountable via a separate IP address. +Alias addresses can be set on the DS server system for a network +interface via +.Xr ifconfig 8 +to create these different IP addresses. +Multiple DSs on the same server may be useful when data for different file systems +on the MDS are being stored on different file system volumes on the FreeBSD +DS system. +.Sh MDS server configuration +The MDS must be a separate FreeBSD system from the FreeBSD DS system(s) and +NFS clients. +It is configured as a NFSv4.1 server with file system(s) exported to +clients. +However, the +.Dq -p +command line argument for +.Xr nfsd +is used to indicate that it is running as the MDS for a pNFS server. +.Pp +The DS(s) must all be mounted on the MDS using the following mount options: +.Bd -literal -offset indent +nfsv4,minorversion=1,soft,retrans=2 +.Ed +.sp +so that they can be defined as DSs in the +.Dq -p +option. +Normally these mounts would be entered in the +.Xr fstab 5 +on the MDS. +For example, if there are four DSs named nfsv4-data[0-3], the +.Xr fstab 5 +lines might look like: +.Bd -literal -offset +nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0 +nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0 +nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0 +nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0 +.Ed +.sp +The +.Xr nfsd 8 +command line option +.Dq -p +indicates that the NFS server is a pNFS MDS and specifies what +DSs are to be used. +.br +For the above +.Xr fstab 5 +example, the +.Xr nfsd 8 +nfs_server_flags line in your +.Xr rc.conf 5 +might look like: +.Bd -literal -offset +nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3" +.Ed +.sp +This example specifies that the data files should be distributed over the +four DSs and File layouts will be issued to pNFS enabled clients. +If issuing Flexible File layouts is desired for this case, setting the sysctl +.Dq vfs.nfsd.default_flexfile +non-zero in your +.Xr sysctl.conf 5 +file will make the +.Nm +do that. +.br +Alternately, this variant of +.Dq nfs_server_flags +will specify that two way mirroring is to be done, via the +.Dq -m +command line option. +.Bd -literal -offset +nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3 -m 2" +.Ed +.sp +With two way mirroring, the data file for each exported file on the MDS +will be stored on two of the DSs. +When mirroring is enabled, the server will always issue Flexible File layouts. +.Pp +It is also possible to specify which DSs are to be used to store data files for +specific exported file systems on the MDS. +For example, if the MDS has exported two file systems +.Dq /export1 +and +.Dq /export2 +to clients, the following variant of +.Dq nfs_server_flags +will specify that data files for +.Dq /export1 +will be stored on nfsv4-data0 and nfsv4-data1, whereas the data files for +.Dq /export2 +will be store on nfsv4-data2 and nfsv4-data3. +.Bd -literal -offset +nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0#/export1,nfsv4-data1:/data1#/export1,nfsv4-data2:/data2#/export2,nfsv4-data3:/data3#/export2" +.Ed +.sp +This can be used by system administrators to control where data files are +stored and might be useful for control of storage use. +For this case, it may be convenient to co-locate more than one of the DSs +on the same FreeBSD server, using separate file systems on the DS system +for storage of the respective DS's data files. +If mirroring is desired for this case, the +.Dq -m +option also needs to be specified. +There must be enough DSs assigned to each exported file system on the MDS +to support the level of mirroring. +The above example would be fine for two way mirroring, but four way mirroring +would not work, since there are only two DSs assigned to each exported file +system on the MDS. +.Pp +The number of subdirectories in each DS is defined by the +.Dq vfs.nfs.dsdirsize +sysctl on the MDS. +This value can be increased from the default of 20, but only when the +.Xr nfsd 8 +is not running and after the additional ds20,... subdirectories have been +created on all the DSs. +For a service that will store a large number of files this sysctl should be +set much larger, to avoid the number of entries in a subdirectory from +getting too large. +.Sh Client mounts +Once operational, NFSv4.1 FreeBSD client mounts done with the +.Dq pnfs +option should do I/O directly on the DSs. +The clients mounting the MDS must be running the +.Xr nfscbd +daemon for pNFS to work. +Set +.Bd -literal -offset indent +nfscbd_enable="YES" +.Ed +.sp +in the +.Xr rc.conf 5 +on these clients. +Non-pNFS aware clients or NFSv3 mounts will do all I/O RPCs on the MDS, +which acts as a proxy for the appropriate DS(s). +.Sh Backing up a pNFS service +Since the data is separated from the metadata, the simple way to back up +a pNFS service is to do so from an NFS client that has the service mounted +on it. +If you back up the MDS exported file system(s) on the MDS, you must do it +in such a way that the +.Dq system +namespace extended attributes get backed up. +.Sh Handling of failed mirrored DSs +When a mirrored DS fails, it can be disabled one of three ways: +.sp +1 - The MDS detects a problem when trying to do proxy +operations on the DS. +This can take a couple of minutes +after the DS failure or network partitioning occurs. +.sp +2 - A pNFS client can report an I/O error that occurred for a DS to the MDS in +the arguments for a LayoutReturn operation. +.sp +3 - The system administrator can perform the pnfsdskill(8) command on the MDS +to disable it. If the system administrator does a pnfsdskill(8) and it fails +with ENXIO (Device not configured) that normally means the DS was already +disabled via #1 or #2. Since doing this is harmless, once a system +administrator knows that there is a problem with a mirrored DS, doing the +command is recommended. +.sp +Once a system administrator knows that a mirrored DS has malfunctioned +or has been network partitioned, they should do the following as root/su +on the MDS: +.Bd -literal -offset indent +# pnfsdskill +# umount -N +.Ed +.sp +Note that the must be the exact mounted-on path +string used when the DS was mounted on the MDS. +.Pp +Once the mirrored DS has been disabled, the pNFS service should continue to +function, but file updates will only happen on the DS(s) +that have not been disabled. Assuming two way mirroring, that implies +the one DS of the pair stored in the +.Dq pnfsd.dsfile +extended attribute for the file on the MDS, for files stored on the disabled DS. +.Pp +The next step is to clear the IP address in the +.Dq pnfsd.dsfile +extended attribute on all files on the MDS for the failed DS. +This is done so that, when the disabled DS is repaired and brought back online, +the data files on this DS will not be used, since they may be out of date. +The command that clears the IP address is +.Xr pnfsdsfile 8 +with the +.Dq -r +option. +.Bd -literal -offset +For example: +# pnfsdsfile -r nfsv4-data3 yyy.c +yyy.c: nfsv4-data2.home.rick ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 0.0.0.0 ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 +.Ed +.sp +replaces nfsv4-data3 with an IPv4 address of 0.0.0.0, so that nfsv4-data3 +will not get used. +.Pp +Normally this will be called within a +.Xr find 1 +command for all regular +files in the exported directory tree and must be done on the MDS. +When used with +.Xr find 1 , +you will probably also want the +.Dq -q +option so that it won't spit out the results for every file. +If the disabled/repaired DS is nfsv4-data3, the commands done on the MDS +would be: +.Bd -literal -offset +# cd +# find . -type f -exec pnfsdsfile -q -r nfsv4-data3 {} \; +.Ed +.sp +There is a problem with the above command if the file found by +.Xr find 1 +is renamed or unlinked before the +.Xr pnfsdsfile 8 +command is done on it. +This should normally generate an error message. +A simple unlink is harmless +but a link/unlink or rename might result in the file not having been processed +under its new name. +To check that all files have their IP addresses set to 0.0.0.0 these +commands can be used (assuming the +.Xr sh 1 +shell): +.Bd -literal -offset +# cd +# find . -type f -exec pnfsdsfile {} \; | sed "/nfsv4-data3/!d" +.Ed +.sp +Any line(s) printed require the +.Xr pnfsdsfile 8 +with +.Dq -r +to be done again. +Once this is done, the replaced/repaired DS can be brought back online. +It should have empty ds0,...,dsN directories under the top level exported +directory for storage of data files just like it did when first set up. +Mount it on the MDS exactly as you did before disabling it. +For the nfsv4-data3 example, the command would be: +.Bd -literal -offset +# mount -t nfs -o nfsv4,minorversion=1,soft,retrans=2 nfsv4-data3:/ /data3 +.Ed +.sp +Then restart the nfsd to re-enable the DS. +.Bd -literal -offset +# /etc/rc.d/nfsd restart +.Ed +.sp +Now, new files can be stored on nfsv4-data3, +but files with the IP address zeroed out on the MDS will not yet use the +repaired DS (nfsv4-data3). +The next step is to go through the exported file tree on the MDS and, +for each of the +files with an IPv4 address of 0.0.0.0 in its extended attribute, copy the file +data to the repaired DS and re-enable use of this mirror for it. +This command for copying the file data for one MDS file is +.Xr pnfsdscopymr 8 +and it will also normally be used in a +.Xr find 1 . +For the example case, the commands on the MDS would be: +.Bd -literal -offset +# cd +# find . -type f -exec pnfsdscopymr -r /data3 {} \; +.Ed +.sp +When this completes, the recovery should be complete or at least nearly so. +As noted above, if a link/unlink or rename occurs on a file name while the +above +.Xr find 1 +is in progress, it may not get copied. +To check for any file(s) not yet copied, the commands are: +.Bd -literal -offset +# cd +# find . -type f -exec pnfsdsfile {} \; | sed "/0\.0\.0\.0/!d" +.Ed +.sp +If this command prints out any file name(s), these files must +have the +.Xr pnfsdscopymr 8 +command done on them to complete the recovery. +.Bd -literal -offset +# pnfsdscopymr -r /data3 +.Ed +.sp +All of these commands are designed to be +done while the pNFS service is running and can be re-run safely. +.Pp +For a more detailed discussion of the setup and management of a pNFS service +see: +.Bd -literal -offset indent +http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt +.Ed +.sp +.Sh SEE ALSO +.Xr nfsv4 4 , +.Xr pnfs 4 , +.Xr exports 5 , +.Xr fstab 5 , +.Xr rc.conf 5 , +.Xr sysctl.conf 5 , +.Xr nfscbd 8 , +.Xr nfsd 8 , +.Xr nfsuserd 8 , +.Xr pnfsdscopymr 8 , +.Xr pnfsdsfile 8 , +.Xr pnfsdskill 8 +.Sh HISTORY +The +.Nm +command first appeared in +.Fx 12.0 . +.Sh BUGS +Since the MDS cannot be mirrored, it is a single point of failure just +as a non +.Tn pNFS +server is. +For non-mirrored configurations, all FreeBSD systems used in the service +are single points of failure.