Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Jun 2018 19:36:33 +0000 (UTC)
From:      Rick Macklem <rmacklem@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r335012 - in head/sys: fs/nfs fs/nfsclient fs/nfsserver nfs
Message-ID:  <201806121936.w5CJaXFs086620@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: rmacklem
Date: Tue Jun 12 19:36:32 2018
New Revision: 335012
URL: https://svnweb.freebsd.org/changeset/base/335012

Log:
  Merge the pNFS server code from projects/pnfs-planb-server into head.
  
  This code merge adds a pNFS service to the NFSv4.1 server. Although it is
  a large commit it should not affect behaviour for a non-pNFS NFS server.
  Some documentation on how this works can be found at:
  http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
  and will hopefully be turned into a proper document soon.
  This is a merge of the kernel code. Userland and man page changes will
  come soon, once the dust settles on this merge.
  It has passed a "make universe", so I hope it will not cause build problems.
  It also adds NFSv4.1 server support for the "current stateid".
  
  Here is a brief overview of the pNFS service:
  A pNFS service separates the Read/Write oeprations from all the other NFSv4.1
  Metadata operations. It is hoped that this separation allows a pNFS service
  to be configured that exceeds the limits of a single NFS server for either
  storage capacity and/or I/O bandwidth.
  It is possible to configure mirroring within the data servers (DSs) so that
  the data storage file for an MDS file will be mirrored on two or more of
  the DSs.
  When this is used, failure of a DS will not stop the pNFS service and a
  failed DS can be recovered once repaired while the pNFS service continues
  to operate.  Although two way mirroring would be the norm, it is possible
  to set a mirroring level of up to four or the number of DSs, whichever is
  less.
  The Metadata server will always be a single point of failure,
  just as a single NFS server is.
  
  A Plan B pNFS service consists of a single MetaData Server (MDS) and K
  Data Servers (DS), all of which are recent FreeBSD systems.
  Clients will mount the MDS as they would a single NFS server.
  When files are created, the MDS creates a file tree identical to what a
  single NFS server creates, except that all the regular (VREG) files will
  be empty. As such, if you look at the exported tree on the MDS directly
  on the MDS server (not via an NFS mount), the files will all be of size 0.
  Each of these files will also have two extended attributes in the system
  attribute name space:
  pnfsd.dsfile - This extended attrbute stores the information that
      the MDS needs to find the data storage file(s) on DS(s) for this file.
  pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime
      and Change attributes for the file, so that the MDS doesn't need to
      acquire the attributes from the DS for every Getattr operation.
  For each regular (VREG) file, the MDS creates a data storage file on one
  (or more if mirroring is enabled) of the DSs in one of the "dsNN"
  subdirectories.  The name of this file is the file handle
  of the file on the MDS in hexadecimal so that the name is unique.
  The DSs use subdirectories named "ds0" to "dsN" so that no one directory
  gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize
  on the MDS, with the default being 20.
  For production servers that will store a lot of files, this value should
  probably be much larger.
  It can be increased when the "nfsd" daemon is not running on the MDS,
  once the "dsK" directories are created.
  
  For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces
  of information to the client that allows it to do I/O directly to the DS.
  DeviceInfo - This is relatively static information that defines what a DS
               is. The critical bits of information returned by the FreeBSD
               server is the IP address of the DS and, for the Flexible
               File layout, that NFSv4.1 is to be used and that it is
               "tightly coupled".
               There is a "deviceid" which identifies the DeviceInfo.
  Layout     - This is per file and can be recalled by the server when it
               is no longer valid. For the FreeBSD server, there is support
               for two types of layout, call File and Flexible File layout.
               Both allow the client to do I/O on the DS via NFSv4.1 I/O
               operations. The Flexible File layout is a more recent variant
               that allows specification of mirrors, where the client is
               expected to do writes to all mirrors to maintain them in a
               consistent state. The Flexible File layout also allows the
               client to report I/O errors for a DS back to the MDS.
               The Flexible File layout supports two variants referred to as
               "tightly coupled" vs "loosely coupled". The FreeBSD server always
               uses the "tightly coupled" variant where the client uses the
               same credentials to do I/O on the DS as it would on the MDS.
               For the "loosely coupled" variant, the layout specifies a
               synthetic user/group that the client uses to do I/O on the DS.
               The FreeBSD server does not do striping and always returns
               layouts for the entire file. The critical information in a layout
               is Read vs Read/Writea and DeviceID(s) that identify which
               DS(s) the data is stored on.
  
  At this time, the MDS generates File Layout layouts to NFSv4.1 clients
  that know how to do pNFS for the non-mirrored DS case unless the sysctl
  vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File
  layouts are generated.
  The mirrored DS configuration always generates Flexible File layouts.
  For NFS clients that do not support NFSv4.1 pNFS, all I/O operations
  are done against the MDS which acts as a proxy for the appropriate DS(s).
  When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy.
  If the DS is on the same machine, the MDS/DS will do the RPC on the DS as
  a proxy and so on, until the machine runs out of some resource, such as
  session slots or mbufs.
  As such, DSs must be separate systems from the MDS.
  
  Tested by:	james.rose@framestore.com
  Relnotes:	yes

Modified:
  head/sys/fs/nfs/nfs.h
  head/sys/fs/nfs/nfs_commonacl.c
  head/sys/fs/nfs/nfs_commonport.c
  head/sys/fs/nfs/nfs_commonsubs.c
  head/sys/fs/nfs/nfs_var.h
  head/sys/fs/nfs/nfsport.h
  head/sys/fs/nfs/nfsproto.h
  head/sys/fs/nfs/nfsrvstate.h
  head/sys/fs/nfsclient/nfs_clport.c
  head/sys/fs/nfsclient/nfs_clrpcops.c
  head/sys/fs/nfsclient/nfs_clstate.c
  head/sys/fs/nfsclient/nfs_clvfsops.c
  head/sys/fs/nfsserver/nfs_nfsdkrpc.c
  head/sys/fs/nfsserver/nfs_nfsdport.c
  head/sys/fs/nfsserver/nfs_nfsdserv.c
  head/sys/fs/nfsserver/nfs_nfsdsocket.c
  head/sys/fs/nfsserver/nfs_nfsdstate.c
  head/sys/fs/nfsserver/nfs_nfsdsubs.c
  head/sys/nfs/nfs_nfssvc.c
  head/sys/nfs/nfssvc.h

Modified: head/sys/fs/nfs/nfs.h
==============================================================================
--- head/sys/fs/nfs/nfs.h	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfs/nfs.h	Tue Jun 12 19:36:32 2018	(r335012)
@@ -98,6 +98,7 @@
 #define	NFSSESSIONHASHSIZE	20	/* Size of server session hash table */
 #endif
 #define	NFSSTATEHASHSIZE	10	/* Size of server stateid hash table */
+#define	NFSLAYOUTHIGHWATER	1000000	/* Upper limit for # of layouts */
 #ifndef	NFSCLDELEGHIGHWATER
 #define	NFSCLDELEGHIGHWATER	10000	/* limit for client delegations */
 #endif
@@ -171,11 +172,20 @@ struct nfsd_addsock_args {
 
 /*
  * nfsd argument for new krpc.
+ * (New version supports pNFS, indicated by NFSSVC_NEWSTRUCT flag.)
  */
 struct nfsd_nfsd_args {
 	const char *principal;	/* GSS-API service principal name */
 	int	minthreads;	/* minimum service thread count */
 	int	maxthreads;	/* maximum service thread count */
+	int	version;	/* Allow multiple variants */
+	char	*addr;		/* pNFS DS addresses */
+	int	addrlen;	/* Length of addrs */
+	char	*dnshost;	/* DNS names for DS addresses */
+	int	dnshostlen;	/* Length of DNS names */
+	char	*dspath;	/* DS Mount path on MDS */
+	int	dspathlen;	/* Length of DS Mount path on MDS */
+	int	mirrorcnt;	/* Number of mirrors to create on DSs */
 };
 
 /*
@@ -186,6 +196,23 @@ struct nfsd_nfsd_args {
 #define	NFSDEV_MAXMIRRORS	4
 #define	NFSDEV_MAXVERS		4
 
+struct nfsd_pnfsd_args {
+	int	op;		/* Which pNFSd op to perform. */
+	char	*mdspath;	/* Path of MDS file. */
+	char	*dspath;	/* Path of recovered DS mounted on dir. */
+	char	*curdspath;	/* Path of current DS mounted on dir. */
+};
+
+#define	PNFSDOP_DELDSSERVER	1
+#define	PNFSDOP_COPYMR		2
+
+/* Old version. */
+struct nfsd_nfsd_oargs {
+	const char *principal;	/* GSS-API service principal name */
+	int	minthreads;	/* minimum service thread count */
+	int	maxthreads;	/* maximum service thread count */
+};
+
 /*
  * Arguments for use by the callback daemon.
  */
@@ -593,8 +620,8 @@ struct nfsrv_descript {
 	NFSSOCKADDR_T		nd_nam2;	/* return socket addr */
 	caddr_t			nd_dpos;	/* Current dissect pos */
 	caddr_t			nd_bpos;	/* Current build pos */
+	u_int64_t		nd_flag;	/* nd_flag */
 	u_int16_t		nd_procnum;	/* RPC # */
-	u_int32_t		nd_flag;	/* nd_flag */
 	u_int32_t		nd_repstat;	/* Reply status */
 	int			*nd_errp;	/* Pointer to ret status */
 	u_int32_t		nd_retxid;	/* Reply xid */
@@ -613,6 +640,8 @@ struct nfsrv_descript {
 	uint32_t		nd_slotid;	/* Slotid for this RPC */
 	SVCXPRT			*nd_xprt;	/* Server RPC handle */
 	uint32_t		*nd_sequence;	/* Sequence Op. ptr */
+	nfsv4stateid_t		nd_curstateid;	/* Current StateID */
+	nfsv4stateid_t		nd_savedcurstateid; /* Saved Current StateID */
 };
 
 #define	nd_princlen	nd_gssnamelen
@@ -649,6 +678,9 @@ struct nfsrv_descript {
 #define	ND_CACHETHIS		0x08000000
 #define	ND_LASTOP		0x10000000
 #define	ND_LOOPBADSESS		0x20000000
+#define	ND_DSSERVER		0x40000000
+#define	ND_CURSTATEID		0x80000000
+#define	ND_SAVEDCURSTATEID	0x100000000
 
 /*
  * ND_GSS should be the "or" of all GSS type authentications.

Modified: head/sys/fs/nfs/nfs_commonacl.c
==============================================================================
--- head/sys/fs/nfs/nfs_commonacl.c	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfs/nfs_commonacl.c	Tue Jun 12 19:36:32 2018	(r335012)
@@ -450,36 +450,6 @@ nfsrv_buildacl(struct nfsrv_descript *nd, NFSACL_T *ac
 }
 
 /*
- * Set an NFSv4 acl.
- */
-APPLESTATIC int
-nfsrv_setacl(vnode_t vp, NFSACL_T *aclp, struct ucred *cred,
-    NFSPROC_T *p)
-{
-	int error;
-
-	if (nfsrv_useacl == 0 || nfs_supportsnfsv4acls(vp) == 0) {
-		error = NFSERR_ATTRNOTSUPP;
-		goto out;
-	}
-	/*
-	 * With NFSv4 ACLs, chmod(2) may need to add additional entries.
-	 * Make sure it has enough room for that - splitting every entry
-	 * into two and appending "canonical six" entries at the end.
-	 * Cribbed out of kern/vfs_acl.c - Rick M.
-	 */
-	if (aclp->acl_cnt > (ACL_MAX_ENTRIES - 6) / 2) {
-		error = NFSERR_ATTRNOTSUPP;
-		goto out;
-	}
-	error = VOP_SETACL(vp, ACL_TYPE_NFS4, aclp, cred, p);
-
-out:
-	NFSEXITCODE(error);
-	return (error);
-}
-
-/*
  * Compare two NFSv4 acls.
  * Return 0 if they are the same, 1 if not the same.
  */

Modified: head/sys/fs/nfs/nfs_commonport.c
==============================================================================
--- head/sys/fs/nfs/nfs_commonport.c	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfs/nfs_commonport.c	Tue Jun 12 19:36:32 2018	(r335012)
@@ -69,6 +69,9 @@ int nfscl_debuglevel = 0;
 char nfsv4_callbackaddr[INET6_ADDRSTRLEN];
 struct callout newnfsd_callout;
 int nfsrv_lughashsize = 100;
+struct mtx nfsrv_dslock_mtx;
+struct nfsdevicehead nfsrv_devidhead;
+volatile int nfsrv_devidcnt = 0;
 void (*nfsd_call_servertimer)(void) = NULL;
 void (*ncl_call_invalcaches)(struct vnode *) = NULL;
 
@@ -768,6 +771,8 @@ nfscommon_modevent(module_t mod, int type, void *data)
 		mtx_init(&nfs_req_mutex, "nfs_req_mutex", NULL, MTX_DEF);
 		mtx_init(&nfsrv_nfsuserdsock.nr_mtx, "nfsuserd", NULL,
 		    MTX_DEF);
+		mtx_init(&nfsrv_dslock_mtx, "nfs4ds", NULL, MTX_DEF);
+		TAILQ_INIT(&nfsrv_devidhead);
 		callout_init(&newnfsd_callout, 1);
 		newnfs_init();
 		nfsd_call_nfscommon = nfssvc_nfscommon;
@@ -794,6 +799,7 @@ nfscommon_modevent(module_t mod, int type, void *data)
 		mtx_destroy(&nfs_slock_mutex);
 		mtx_destroy(&nfs_req_mutex);
 		mtx_destroy(&nfsrv_nfsuserdsock.nr_mtx);
+		mtx_destroy(&nfsrv_dslock_mtx);
 		loaded = 0;
 		break;
 	default:

Modified: head/sys/fs/nfs/nfs_commonsubs.c
==============================================================================
--- head/sys/fs/nfs/nfs_commonsubs.c	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfs/nfs_commonsubs.c	Tue Jun 12 19:36:32 2018	(r335012)
@@ -70,15 +70,24 @@ gid_t nfsrv_defaultgid = GID_NOGROUP;
 int nfsrv_lease = NFSRV_LEASE;
 int ncl_mbuf_mlen = MLEN;
 int nfsd_enable_stringtouid = 0;
+int nfsrv_doflexfile = 0;
 static int nfs_enable_uidtostring = 0;
 NFSNAMEIDMUTEX;
 NFSSOCKMUTEX;
 extern int nfsrv_lughashsize;
+extern struct mtx nfsrv_dslock_mtx;
+extern volatile int nfsrv_devidcnt;
+extern int nfscl_debuglevel;
+extern struct nfsdevicehead nfsrv_devidhead;
 
 SYSCTL_DECL(_vfs_nfs);
 SYSCTL_INT(_vfs_nfs, OID_AUTO, enable_uidtostring, CTLFLAG_RW,
     &nfs_enable_uidtostring, 0, "Make nfs always send numeric owner_names");
 
+int nfsrv_maxpnfsmirror = 1;
+SYSCTL_INT(_vfs_nfs, OID_AUTO, pnfsmirror, CTLFLAG_RD,
+    &nfsrv_maxpnfsmirror, 0, "Mirror level for pNFS service");
+
 /*
  * This array of structures indicates, for V4:
  * retfh - which of 3 types of calling args are used
@@ -487,7 +496,7 @@ nfsm_fhtom(struct nfsrv_descript *nd, u_int8_t *fhp, i
 {
 	u_int32_t *tl;
 	u_int8_t *cp;
-	int fullsiz, bytesize = 0;
+	int fullsiz, rem, bytesize = 0;
 
 	if (size == 0)
 		size = NFSX_MYFH;
@@ -504,6 +513,7 @@ nfsm_fhtom(struct nfsrv_descript *nd, u_int8_t *fhp, i
 	case ND_NFSV3:
 	case ND_NFSV4:
 		fullsiz = NFSM_RNDUP(size);
+		rem = fullsiz - size;
 		if (set_true) {
 		    bytesize = 2 * NFSX_UNSIGNED + fullsiz;
 		    NFSM_BUILD(tl, u_int32_t *, NFSX_UNSIGNED);
@@ -1768,6 +1778,40 @@ nfsv4_loadattr(struct nfsrv_descript *nd, vnode_t vp,
 			}
 			attrsum += cnt;
 			break;
+		case NFSATTRBIT_FSLAYOUTTYPE:
+		case NFSATTRBIT_LAYOUTTYPE:
+			NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED);
+			attrsum += NFSX_UNSIGNED;
+			i = fxdr_unsigned(int, *tl);
+			if (i > 0) {
+				NFSM_DISSECT(tl, u_int32_t *, i *
+				    NFSX_UNSIGNED);
+				attrsum += i * NFSX_UNSIGNED;
+				j = fxdr_unsigned(int, *tl);
+				if (i == 1 && compare && !(*retcmpp) &&
+				    (((nfsrv_doflexfile != 0 ||
+				       nfsrv_maxpnfsmirror > 1) &&
+				      j != NFSLAYOUT_FLEXFILE) ||
+				    (nfsrv_doflexfile == 0 &&
+				     j != NFSLAYOUT_NFSV4_1_FILES)))
+					*retcmpp = NFSERR_NOTSAME;
+			}
+			if (nfsrv_devidcnt == 0) {
+				if (compare && !(*retcmpp) && i > 0)
+					*retcmpp = NFSERR_NOTSAME;
+			} else {
+				if (compare && !(*retcmpp) && i != 1)
+					*retcmpp = NFSERR_NOTSAME;
+			}
+			break;
+		case NFSATTRBIT_LAYOUTALIGNMENT:
+		case NFSATTRBIT_LAYOUTBLKSIZE:
+			NFSM_DISSECT(tl, u_int32_t *, NFSX_UNSIGNED);
+			attrsum += NFSX_UNSIGNED;
+			i = fxdr_unsigned(int, *tl);
+			if (compare && !(*retcmpp) && i != NFS_SRVMAXIO)
+				*retcmpp = NFSERR_NOTSAME;
+			break;
 		default:
 			printf("EEK! nfsv4_loadattr unknown attr=%d\n",
 				bitpos);
@@ -2024,7 +2068,8 @@ APPLESTATIC int
 nfsv4_fillattr(struct nfsrv_descript *nd, struct mount *mp, vnode_t vp,
     NFSACL_T *saclp, struct vattr *vap, fhandle_t *fhp, int rderror,
     nfsattrbit_t *attrbitp, struct ucred *cred, NFSPROC_T *p, int isdgram,
-    int reterr, int supports_nfsv4acls, int at_root, uint64_t mounted_on_fileno)
+    int reterr, int supports_nfsv4acls, int at_root, uint64_t mounted_on_fileno,
+    struct statfs *pnfssf)
 {
 	int bitpos, retnum = 0;
 	u_int32_t *tl;
@@ -2426,25 +2471,45 @@ nfsv4_fillattr(struct nfsrv_descript *nd, struct mount
 			break;
 		case NFSATTRBIT_SPACEAVAIL:
 			NFSM_BUILD(tl, u_int32_t *, NFSX_HYPER);
-			if (priv_check_cred(cred, PRIV_VFS_BLOCKRESERVE, 0))
-				uquad = (u_int64_t)fs->f_bfree;
+			if (priv_check_cred(cred, PRIV_VFS_BLOCKRESERVE, 0)) {
+				if (pnfssf != NULL)
+					uquad = (u_int64_t)pnfssf->f_bfree;
+				else
+					uquad = (u_int64_t)fs->f_bfree;
+			} else {
+				if (pnfssf != NULL)
+					uquad = (u_int64_t)pnfssf->f_bavail;
+				else
+					uquad = (u_int64_t)fs->f_bavail;
+			}
+			if (pnfssf != NULL)
+				uquad *= pnfssf->f_bsize;
 			else
-				uquad = (u_int64_t)fs->f_bavail;
-			uquad *= fs->f_bsize;
+				uquad *= fs->f_bsize;
 			txdr_hyper(uquad, tl);
 			retnum += NFSX_HYPER;
 			break;
 		case NFSATTRBIT_SPACEFREE:
 			NFSM_BUILD(tl, u_int32_t *, NFSX_HYPER);
-			uquad = (u_int64_t)fs->f_bfree;
-			uquad *= fs->f_bsize;
+			if (pnfssf != NULL) {
+				uquad = (u_int64_t)pnfssf->f_bfree;
+				uquad *= pnfssf->f_bsize;
+			} else {
+				uquad = (u_int64_t)fs->f_bfree;
+				uquad *= fs->f_bsize;
+			}
 			txdr_hyper(uquad, tl);
 			retnum += NFSX_HYPER;
 			break;
 		case NFSATTRBIT_SPACETOTAL:
 			NFSM_BUILD(tl, u_int32_t *, NFSX_HYPER);
-			uquad = (u_int64_t)fs->f_blocks;
-			uquad *= fs->f_bsize;
+			if (pnfssf != NULL) {
+				uquad = (u_int64_t)pnfssf->f_blocks;
+				uquad *= pnfssf->f_bsize;
+			} else {
+				uquad = (u_int64_t)fs->f_blocks;
+				uquad *= fs->f_bsize;
+			}
 			txdr_hyper(uquad, tl);
 			retnum += NFSX_HYPER;
 			break;
@@ -2514,6 +2579,33 @@ nfsv4_fillattr(struct nfsrv_descript *nd, struct mount
 			NFSCLRBIT_ATTRBIT(&attrbits, NFSATTRBIT_TIMEACCESSSET);
 			retnum += nfsrv_putattrbit(nd, &attrbits);
 			break;
+		case NFSATTRBIT_FSLAYOUTTYPE:
+		case NFSATTRBIT_LAYOUTTYPE:
+			if (nfsrv_devidcnt == 0)
+				siz = 1;
+			else
+				siz = 2;
+			if (siz == 2) {
+				NFSM_BUILD(tl, u_int32_t *, 2 * NFSX_UNSIGNED);
+				*tl++ = txdr_unsigned(1);	/* One entry. */
+				if (nfsrv_doflexfile != 0 ||
+				    nfsrv_maxpnfsmirror > 1)
+					*tl = txdr_unsigned(NFSLAYOUT_FLEXFILE);
+				else
+					*tl = txdr_unsigned(
+					    NFSLAYOUT_NFSV4_1_FILES);
+			} else {
+				NFSM_BUILD(tl, u_int32_t *, NFSX_UNSIGNED);
+				*tl = 0;
+			}
+			retnum += siz * NFSX_UNSIGNED;
+			break;
+		case NFSATTRBIT_LAYOUTALIGNMENT:
+		case NFSATTRBIT_LAYOUTBLKSIZE:
+			NFSM_BUILD(tl, u_int32_t *, NFSX_UNSIGNED);
+			*tl = txdr_unsigned(NFS_SRVMAXIO);
+			retnum += NFSX_UNSIGNED;
+			break;
 		default:
 			printf("EEK! Bad V4 attribute bitpos=%d\n", bitpos);
 		}
@@ -4238,5 +4330,40 @@ nfsv4_freeslot(struct nfsclsession *sep, int slot)
 	sep->nfsess_slots &= ~bitval;
 	wakeup(&sep->nfsess_slots);
 	mtx_unlock(&sep->nfsess_mtx);
+}
+
+/*
+ * Search for a matching pnfsd mirror device structure, base on the nmp arg.
+ * Return one if found, NULL otherwise.
+ */
+struct nfsdevice *
+nfsv4_findmirror(struct nfsmount *nmp)
+{
+	struct nfsdevice *ds, *fndds;
+	int fndmirror;
+
+	mtx_assert(NFSDDSMUTEXPTR, MA_OWNED);
+	/*
+	 * Search the DS server list for a match with nmp.
+	 * Remove the DS entry if found and there is a mirror.
+	 */
+	fndds = NULL;
+	fndmirror = 0;
+	if (nfsrv_devidcnt == 0)
+		return (fndds);
+	TAILQ_FOREACH(ds, &nfsrv_devidhead, nfsdev_list) {
+		if (ds->nfsdev_nmp == nmp) {
+			NFSCL_DEBUG(4, "fnd main ds\n");
+			fndds = ds;
+		} else if (ds->nfsdev_nmp != NULL)
+			fndmirror = 1;
+		if (fndds != NULL && fndmirror != 0)
+			break;
+	}
+	if (fndmirror == 0) {
+		NFSCL_DEBUG(4, "no mirror for DS\n");
+		return (NULL);
+	}
+	return (fndds);
 }
 

Modified: head/sys/fs/nfs/nfs_var.h
==============================================================================
--- head/sys/fs/nfs/nfs_var.h	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfs/nfs_var.h	Tue Jun 12 19:36:32 2018	(r335012)
@@ -63,6 +63,7 @@ union nethostaddr;
 struct nfsstate;
 struct nfslock;
 struct nfsclient;
+struct nfslayout;
 struct nfsdsession;
 struct nfslockconflict;
 struct nfsd_idargs;
@@ -82,6 +83,9 @@ struct nfsv4lock;
 struct nfsvattr;
 struct nfs_vattr;
 struct NFSSVCARGS;
+struct nfsdevice;
+struct pnfsdsfile;
+struct pnfsdsattr;
 #ifdef __FreeBSD__
 NFS_ACCESS_ARGS;
 NFS_OPEN_ARGS;
@@ -112,9 +116,9 @@ int nfsrv_openctrl(struct nfsrv_descript *, vnode_t,
 int nfsrv_opencheck(nfsquad_t, nfsv4stateid_t *, struct nfsstate *,
     vnode_t, struct nfsrv_descript *, NFSPROC_T *, int);
 int nfsrv_openupdate(vnode_t, struct nfsstate *, nfsquad_t,
-    nfsv4stateid_t *, struct nfsrv_descript *, NFSPROC_T *);
+    nfsv4stateid_t *, struct nfsrv_descript *, NFSPROC_T *, int *);
 int nfsrv_delegupdate(struct nfsrv_descript *, nfsquad_t, nfsv4stateid_t *,
-    vnode_t, int, struct ucred *, NFSPROC_T *);
+    vnode_t, int, struct ucred *, NFSPROC_T *, int *);
 int nfsrv_releaselckown(struct nfsstate *, nfsquad_t, NFSPROC_T *);
 void nfsrv_zapclient(struct nfsclient *, NFSPROC_T *);
 int nfssvc_idname(struct nfsd_idargs *);
@@ -131,7 +135,7 @@ int nfsrv_checksetattr(vnode_t, struct nfsrv_descript 
     nfsv4stateid_t *, struct nfsvattr *, nfsattrbit_t *, struct nfsexstuff *,
     NFSPROC_T *);
 int nfsrv_checkgetattr(struct nfsrv_descript *, vnode_t,
-    struct nfsvattr *, nfsattrbit_t *, struct ucred *, NFSPROC_T *);
+    struct nfsvattr *, nfsattrbit_t *, NFSPROC_T *);
 int nfsrv_nfsuserdport(struct sockaddr *, u_short, NFSPROC_T *);
 void nfsrv_nfsuserddelport(void);
 void nfsrv_throwawayallstate(NFSPROC_T *);
@@ -140,6 +144,30 @@ int nfsrv_checksequence(struct nfsrv_descript *, uint3
 int nfsrv_checkreclaimcomplete(struct nfsrv_descript *);
 void nfsrv_cache_session(uint8_t *, uint32_t, int, struct mbuf **);
 void nfsrv_freeallbackchannel_xprts(void);
+int nfsrv_layoutcommit(struct nfsrv_descript *, vnode_t, int, int, uint64_t,
+    uint64_t, uint64_t, int, struct timespec *, int, nfsv4stateid_t *,
+    int, char *, int *, uint64_t *, struct ucred *, NFSPROC_T *);
+int nfsrv_layoutget(struct nfsrv_descript *, vnode_t, struct nfsexstuff *,
+    int, int *, uint64_t *, uint64_t *, uint64_t, nfsv4stateid_t *, int, int *,
+    int *, char *, struct ucred *, NFSPROC_T *);
+void nfsrv_flexmirrordel(char *, NFSPROC_T *);
+void nfsrv_recalloldlayout(NFSPROC_T *);
+int nfsrv_layoutreturn(struct nfsrv_descript *, vnode_t, int, int, uint64_t,
+    uint64_t, int, int, nfsv4stateid_t *, int, uint32_t *, int *,
+    struct ucred *, NFSPROC_T *);
+int nfsrv_getdevinfo(char *, int, uint32_t *, uint32_t *, int *, char **);
+void nfsrv_freeonedevid(struct nfsdevice *);
+void nfsrv_freealllayoutsanddevids(void);
+void nfsrv_freefilelayouts(fhandle_t *);
+int nfsrv_deldsserver(char *, NFSPROC_T *);
+struct nfsdevice *nfsrv_deldsnmp(struct nfsmount *, NFSPROC_T *);
+int nfsrv_createdevids(struct nfsd_nfsd_args *, NFSPROC_T *);
+int nfsrv_checkdsattr(struct nfsrv_descript *, vnode_t, NFSPROC_T *);
+int nfsrv_copymr(vnode_t, vnode_t, vnode_t, struct nfsdevice *,
+    struct pnfsdsfile *, struct pnfsdsfile *, int, struct ucred *, NFSPROC_T *);
+int nfsrv_mdscopymr(char *, char *, char *, char *, int *, char *, NFSPROC_T *,
+    struct vnode **, struct vnode **, struct pnfsdsfile **, struct nfsdevice **,
+    struct nfsdevice **);
 
 /* nfs_nfsdserv.c */
 int nfsrvd_access(struct nfsrv_descript *, int,
@@ -240,6 +268,14 @@ int nfsrvd_destroysession(struct nfsrv_descript *, int
     vnode_t, NFSPROC_T *, struct nfsexstuff *);
 int nfsrvd_freestateid(struct nfsrv_descript *, int,
     vnode_t, NFSPROC_T *, struct nfsexstuff *);
+int nfsrvd_layoutget(struct nfsrv_descript *, int,
+    vnode_t, NFSPROC_T *, struct nfsexstuff *);
+int nfsrvd_getdevinfo(struct nfsrv_descript *, int,
+    vnode_t, NFSPROC_T *, struct nfsexstuff *);
+int nfsrvd_layoutcommit(struct nfsrv_descript *, int,
+    vnode_t, NFSPROC_T *, struct nfsexstuff *);
+int nfsrvd_layoutreturn(struct nfsrv_descript *, int,
+    vnode_t, NFSPROC_T *, struct nfsexstuff *);
 int nfsrvd_teststateid(struct nfsrv_descript *, int,
     vnode_t, NFSPROC_T *, struct nfsexstuff *);
 int nfsrvd_notsupp(struct nfsrv_descript *, int,
@@ -306,6 +342,7 @@ int nfsv4_sequencelookup(struct nfsmount *, struct nfs
     int *, uint32_t *, uint8_t *);
 void nfsv4_freeslot(struct nfsclsession *, int);
 struct ucred *nfsrv_getgrpscred(struct ucred *);
+struct nfsdevice *nfsv4_findmirror(struct nfsmount *);
 
 /* nfs_clcomsubs.c */
 void nfsm_uiombuf(struct nfsrv_descript *, struct uio *, int);
@@ -339,7 +376,7 @@ void nfsrv_wcc(struct nfsrv_descript *, int, struct nf
     struct nfsvattr *);
 int nfsv4_fillattr(struct nfsrv_descript *, struct mount *, vnode_t, NFSACL_T *,
     struct vattr *, fhandle_t *, int, nfsattrbit_t *,
-    struct ucred *, NFSPROC_T *, int, int, int, int, uint64_t);
+    struct ucred *, NFSPROC_T *, int, int, int, int, uint64_t, struct statfs *);
 void nfsrv_fillattr(struct nfsrv_descript *, struct nfsvattr *);
 void nfsrv_adj(mbuf_t, int, int);
 void nfsrv_postopattr(struct nfsrv_descript *, int, struct nfsvattr *);
@@ -387,8 +424,6 @@ int nfsrv_dissectace(struct nfsrv_descript *, struct a
     int *, int *, NFSPROC_T *);
 int nfsrv_buildacl(struct nfsrv_descript *, NFSACL_T *, enum vtype,
     NFSPROC_T *);
-int nfsrv_setacl(vnode_t, NFSACL_T *, struct ucred *,
-    NFSPROC_T *);
 int nfsrv_compareacl(NFSACL_T *, NFSACL_T *);
 
 /* nfs_clrpcops.c */
@@ -603,8 +638,8 @@ int ncl_flush(vnode_t, int, NFSPROC_T *, int, int);
 void ncl_invalcaches(vnode_t);
 
 /* nfs_nfsdport.c */
-int nfsvno_getattr(vnode_t, struct nfsvattr *, struct ucred *,
-    NFSPROC_T *, int);
+int nfsvno_getattr(vnode_t, struct nfsvattr *, struct nfsrv_descript *,
+    NFSPROC_T *, int, nfsattrbit_t *);
 int nfsvno_setattr(vnode_t, struct nfsvattr *, struct ucred *,
     NFSPROC_T *, struct nfsexstuff *);
 int nfsvno_getfh(vnode_t, fhandle_t *, NFSPROC_T *);
@@ -618,7 +653,7 @@ int nfsvno_readlink(vnode_t, struct ucred *, NFSPROC_T
     mbuf_t *, int *);
 int nfsvno_read(vnode_t, off_t, int, struct ucred *, NFSPROC_T *,
     mbuf_t *, mbuf_t *);
-int nfsvno_write(vnode_t, off_t, int, int, int, mbuf_t,
+int nfsvno_write(vnode_t, off_t, int, int, int *, mbuf_t,
     char *, struct ucred *, NFSPROC_T *);
 int nfsvno_createsub(struct nfsrv_descript *, struct nameidata *,
     vnode_t *, struct nfsvattr *, int *, int32_t *, NFSDEV_T, NFSPROC_T *,
@@ -647,7 +682,7 @@ void nfsvno_open(struct nfsrv_descript *, struct namei
     nfsv4stateid_t *, struct nfsstate *, int *, struct nfsvattr *, int32_t *,
     int, NFSACL_T *, nfsattrbit_t *, struct ucred *, NFSPROC_T *,
     struct nfsexstuff *, vnode_t *);
-int nfsvno_updfilerev(vnode_t, struct nfsvattr *, struct ucred *,
+int nfsvno_updfilerev(vnode_t, struct nfsvattr *, struct nfsrv_descript *,
     NFSPROC_T *);
 int nfsvno_fillattr(struct nfsrv_descript *, struct mount *, vnode_t,
     struct nfsvattr *, fhandle_t *, int, nfsattrbit_t *,
@@ -667,6 +702,17 @@ int nfsvno_testexp(struct nfsrv_descript *, struct nfs
 uint32_t nfsrv_hashfh(fhandle_t *);
 uint32_t nfsrv_hashsessionid(uint8_t *);
 void nfsrv_backupstable(void);
+int nfsrv_dsgetdevandfh(struct vnode *, NFSPROC_T *, int *, fhandle_t *,
+    char *);
+int nfsrv_dsgetsockmnt(struct vnode *, int, char *, int *, int *,
+    NFSPROC_T *, struct vnode **, fhandle_t *, char *, char *,
+    struct vnode **, struct nfsmount **, struct nfsmount *, int *, int *);
+int nfsrv_dscreate(struct vnode *, struct vattr *, struct vattr *,
+    fhandle_t *, struct pnfsdsfile *, struct pnfsdsattr *, char *,
+    struct ucred *, NFSPROC_T *, struct vnode **);
+int nfsrv_updatemdsattr(struct vnode *, struct nfsvattr *, NFSPROC_T *);
+void nfsrv_killrpcs(struct nfsmount *);
+int nfsrv_setacl(struct vnode *, NFSACL_T *, struct ucred *, NFSPROC_T *);
 
 /* nfs_commonkrpc.c */
 int newnfs_nmcancelreqs(struct nfsmount *);

Modified: head/sys/fs/nfs/nfsport.h
==============================================================================
--- head/sys/fs/nfs/nfsport.h	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfs/nfsport.h	Tue Jun 12 19:36:32 2018	(r335012)
@@ -701,10 +701,18 @@ void nfsrvd_rcv(struct socket *, void *, int);
 #define	NFSSESSIONMUTEXPTR(s)	(&((s)->mtx))
 #define	NFSLOCKSESSION(s)	mtx_lock(&((s)->mtx))
 #define	NFSUNLOCKSESSION(s)	mtx_unlock(&((s)->mtx))
+#define	NFSLAYOUTMUTEXPTR(l)	(&((l)->mtx))
 #define	NFSLOCKLAYOUT(l)	mtx_lock(&((l)->mtx))
 #define	NFSUNLOCKLAYOUT(l)	mtx_unlock(&((l)->mtx))
+#define	NFSDDSMUTEXPTR		(&nfsrv_dslock_mtx)
 #define	NFSDDSLOCK()		mtx_lock(&nfsrv_dslock_mtx)
 #define	NFSDDSUNLOCK()		mtx_unlock(&nfsrv_dslock_mtx)
+#define	NFSDDONTLISTMUTEXPTR	(&nfsrv_dontlistlock_mtx)
+#define	NFSDDONTLISTLOCK()	mtx_lock(&nfsrv_dontlistlock_mtx)
+#define	NFSDDONTLISTUNLOCK()	mtx_unlock(&nfsrv_dontlistlock_mtx)
+#define	NFSDRECALLMUTEXPTR	(&nfsrv_recalllock_mtx)
+#define	NFSDRECALLLOCK()	mtx_lock(&nfsrv_recalllock_mtx)
+#define	NFSDRECALLUNLOCK()	mtx_unlock(&nfsrv_recalllock_mtx)
 
 /*
  * Use these macros to initialize/free a mutex.
@@ -1036,6 +1044,15 @@ struct nfsreq {
  * used in both places that call getnewvnode().
  */
 extern const char nfs_vnode_tag[];
+
+/*
+ * Check for the errors that indicate a DS should be disabled.
+ * ENXIO indicates that the krpc cannot do an RPC on the DS.
+ * EIO is returned by the RPC as an indication of I/O problems on the
+ * server.
+ * Are there other fatal errors?
+ */
+#define	nfsds_failerr(e)	((e) == ENXIO || (e) == EIO)
 
 #endif	/* _KERNEL */
 

Modified: head/sys/fs/nfs/nfsproto.h
==============================================================================
--- head/sys/fs/nfs/nfsproto.h	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfs/nfsproto.h	Tue Jun 12 19:36:32 2018	(r335012)
@@ -260,6 +260,12 @@
 #define	NFSX_V4SETTIME		(NFSX_UNSIGNED + NFSX_V4TIME)
 #define	NFSX_V4SESSIONID	16
 #define	NFSX_V4DEVICEID		16
+#define	NFSX_V4PNFSFH		(sizeof(fhandle_t) + 1)
+#define	NFSX_V4FILELAYOUT	(4 * NFSX_UNSIGNED + NFSX_V4DEVICEID +	\
+				 NFSX_HYPER + NFSM_RNDUP(NFSX_V4PNFSFH))
+#define	NFSX_V4FLEXLAYOUT(m)	(NFSX_HYPER + 3 * NFSX_UNSIGNED +		\
+    ((m) * (NFSX_V4DEVICEID + NFSX_STATEID + NFSM_RNDUP(NFSX_V4PNFSFH) +	\
+    8 * NFSX_UNSIGNED)))
 
 /* sizes common to multiple NFS versions */
 #define	NFSX_FHMAX		(NFSX_V4FHMAX)
@@ -272,6 +278,11 @@
 /* variants for multiple versions */
 #define	NFSX_STATFS(v3)		((v3) ? NFSX_V3STATFS : NFSX_V2STATFS)
 
+/*
+ * Beware.  NFSPROC_NULL and friends are defined in
+ * <rpcsvc/nfs_prot.h> as well and the numbers are different.
+ */
+#ifndef	NFSPROC_NULL
 /* nfs rpc procedure numbers (before version mapping) */
 #define	NFSPROC_NULL		0
 #define	NFSPROC_GETATTR		1
@@ -295,6 +306,7 @@
 #define	NFSPROC_FSINFO		19
 #define	NFSPROC_PATHCONF	20
 #define	NFSPROC_COMMIT		21
+#endif	/* NFSPROC_NULL */
 
 /*
  * The lower numbers -> 21 are used by NFSv2 and v3. These define higher
@@ -652,6 +664,7 @@
 /* Flags for File Layout. */
 #define	NFSFLAYUTIL_DENSE		0x1
 #define	NFSFLAYUTIL_COMMIT_THRU_MDS	0x2
+#define	NFSFLAYUTIL_STRIPE_MASK		0xffffffc0
 
 /* Flags for Flex File Layout. */
 #define	NFSFLEXFLAG_NO_LAYOUTCOMMIT	0x00000001
@@ -668,6 +681,7 @@
 #define	NFSCDFS4_BACK		0x2
 #define	NFSCDFS4_BOTH		0x3
 
+#if defined(_KERNEL) || defined(KERNEL)
 /* Conversion macros */
 #define	vtonfsv2_mode(t,m) 						\
 		txdr_unsigned(((t) == VFIFO) ? MAKEIMODE(VCHR, (m)) : 	\
@@ -819,6 +833,7 @@ struct nfsv3_sattr {
 	u_int32_t sa_mtimetype;
 	nfstime3  sa_mtime;
 };
+#endif	/* _KERNEL */
 
 /*
  * The attribute bits used for V4.
@@ -1046,7 +1061,8 @@ struct nfsv3_sattr {
  	NFSATTRBM_MOUNTEDONFILEID |					\
 	NFSATTRBM_QUOTAHARD |                        			\
     	NFSATTRBM_QUOTASOFT |                        			\
-    	NFSATTRBM_QUOTAUSED)
+    	NFSATTRBM_QUOTAUSED |						\
+	NFSATTRBM_FSLAYOUTTYPE)
 
 
 #ifdef QUOTA
@@ -1062,7 +1078,11 @@ struct nfsv3_sattr {
 #define	NFSATTRBIT_SUPP1	NFSATTRBIT_S1
 #endif
 
-#define	NFSATTRBIT_SUPP2	NFSATTRBM_SUPPATTREXCLCREAT
+#define	NFSATTRBIT_SUPP2						\
+	(NFSATTRBM_LAYOUTTYPE |						\
+	NFSATTRBM_LAYOUTBLKSIZE |					\
+	NFSATTRBM_LAYOUTALIGNMENT |					\
+	NFSATTRBM_SUPPATTREXCLCREAT)
 
 /*
  * NFSATTRBIT_SUPPSETONLY is the OR of NFSATTRBIT_TIMEACCESSSET and
@@ -1378,5 +1398,15 @@ struct nfsv4stateid {
 	u_int32_t	other[NFSX_STATEIDOTHER / NFSX_UNSIGNED];
 };
 typedef struct nfsv4stateid nfsv4stateid_t;
+
+/* Notify bits and notify bitmap size. */
+#define	NFSV4NOTIFY_CHANGE	1
+#define	NFSV4NOTIFY_DELETE	2
+#define	NFSV4_NOTIFYBITMAP	1	/* # of 32bit values needed for bits */
+
+/* Layoutreturn kinds. */
+#define	NFSV4LAYOUTRET_FILE	1
+#define	NFSV4LAYOUTRET_FSID	2
+#define	NFSV4LAYOUTRET_ALL	3
 
 #endif	/* _NFS_NFSPROTO_H_ */

Modified: head/sys/fs/nfs/nfsrvstate.h
==============================================================================
--- head/sys/fs/nfs/nfsrvstate.h	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfs/nfsrvstate.h	Tue Jun 12 19:36:32 2018	(r335012)
@@ -31,6 +31,7 @@
 #ifndef _NFS_NFSRVSTATE_H_
 #define	_NFS_NFSRVSTATE_H_
 
+#if defined(_KERNEL) || defined(KERNEL)
 /*
  * Definitions for NFS V4 server state handling.
  */
@@ -46,6 +47,10 @@ LIST_HEAD(nfslockhead, nfslock);
 LIST_HEAD(nfslockhashhead, nfslockfile);
 LIST_HEAD(nfssessionhead, nfsdsession);
 LIST_HEAD(nfssessionhashhead, nfsdsession);
+TAILQ_HEAD(nfslayouthead, nfslayout);
+SLIST_HEAD(nfsdsdirhead, nfsdsdir);
+TAILQ_HEAD(nfsdevicehead, nfsdevice);
+LIST_HEAD(nfsdontlisthead, nfsdontlist);
 
 /*
  * List head for nfsusrgrp.
@@ -74,6 +79,13 @@ struct nfssessionhash {
 #define	NFSSESSIONHASH(f) 						\
 	(&nfssessionhash[nfsrv_hashsessionid(f) % nfsrv_sessionhashsize])
 
+struct nfslayouthash {
+	struct mtx		mtx;
+	struct nfslayouthead	list;
+};
+#define	NFSLAYOUTHASH(f) 						\
+	(&nfslayouthash[nfsrv_hashfh(f) % nfsrv_layouthashsize])
+
 /*
  * Client server structure for V4. It is doubly linked into two lists.
  * The first is a hash table based on the clientid and the second is a
@@ -112,6 +124,31 @@ struct nfsclient {
 #define	CLOPS_RENEWOP		0x0004
 
 /*
+ * Structure for NFSv4.1 Layouts.
+ * Malloc'd to correct size for the lay_xdr.
+ */
+struct nfslayout {
+	TAILQ_ENTRY(nfslayout)	lay_list;
+	nfsv4stateid_t		lay_stateid;
+	nfsquad_t		lay_clientid;
+	fhandle_t		lay_fh;
+	fsid_t			lay_fsid;
+	uint32_t		lay_layoutlen;
+	uint16_t		lay_mirrorcnt;
+	uint16_t		lay_trycnt;
+	uint16_t		lay_type;
+	uint16_t		lay_flags;
+	uint32_t		lay_xdr[0];
+};
+
+/* Flags for lay_flags. */
+#define	NFSLAY_READ	0x0001
+#define	NFSLAY_RW	0x0002
+#define	NFSLAY_RECALL	0x0004
+#define	NFSLAY_RETURNED	0x0008
+#define	NFSLAY_CALLB	0x0010
+
+/*
  * Structure for an NFSv4.1 session.
  * Locking rules for this structure.
  * To add/delete one of these structures from the lists, you must lock
@@ -290,9 +327,72 @@ struct nfsf_rec {
 	u_int32_t	numboots;		/* Number of boottimes */
 };
 
-#if defined(_KERNEL) || defined(KERNEL)
 void nfsrv_cleanclient(struct nfsclient *, NFSPROC_T *);
 void nfsrv_freedeleglist(struct nfsstatehead *);
-#endif
+
+/*
+ * This structure is used to create the list of device info entries for
+ * a GetDeviceInfo operation and stores the DS server info.
+ * The nfsdev_addrandhost field has the fully qualified host domain name
+ * followed by the network address in XDR.
+ * It is allocated with nfsrv_dsdirsize nfsdev_dsdir[] entries.
+ */
+struct nfsdevice {
+	TAILQ_ENTRY(nfsdevice)	nfsdev_list;
+	vnode_t			nfsdev_dvp;
+	struct nfsmount		*nfsdev_nmp;
+	char			nfsdev_deviceid[NFSX_V4DEVICEID];
+	uint16_t		nfsdev_hostnamelen;
+	uint16_t		nfsdev_fileaddrlen;
+	uint16_t		nfsdev_flexaddrlen;
+	char			*nfsdev_fileaddr;
+	char			*nfsdev_flexaddr;
+	char			*nfsdev_host;
+	uint32_t		nfsdev_nextdir;
+	vnode_t			nfsdev_dsdir[0];
+};
+
+/*
+ * This structure holds the va_size, va_filerev, va_atime and va_mtime for the
+ * DS file and is stored in the metadata file's extended attribute pnfsd.dsattr.
+ */
+struct pnfsdsattr {
+	uint64_t	dsa_filerev;
+	uint64_t	dsa_size;
+	struct timespec	dsa_atime;
+	struct timespec	dsa_mtime;
+};
+
+/*
+ * This structure is a list element for a list the pNFS server uses to
+ * mark that the recovery of a mirror file is in progress.
+ */
+struct nfsdontlist {
+	LIST_ENTRY(nfsdontlist)	nfsmr_list;
+	uint32_t		nfsmr_flags;
+	fhandle_t		nfsmr_fh;
+};
+
+/* nfsmr_flags bits. */
+#define	NFSMR_DONTLAYOUT	0x00000001
+
+#endif	/* defined(_KERNEL) || defined(KERNEL) */
+
+/*
+ * This structure holds the information about the DS file and is stored
+ * in the metadata file's extended attribute called pnfsd.dsfile.
+ */
+#define	PNFS_FILENAME_LEN	(2 * sizeof(fhandle_t))
+struct pnfsdsfile {
+	fhandle_t	dsf_fh;
+	uint32_t	dsf_dir;
+	union {
+		struct sockaddr_in	sin;
+		struct sockaddr_in6	sin6;
+	} dsf_nam;
+	char		dsf_filename[PNFS_FILENAME_LEN + 1];
+};
+#define	dsf_sin		dsf_nam.sin
+#define	dsf_sin6	dsf_nam.sin6
 
 #endif	/* _NFS_NFSRVSTATE_H_ */

Modified: head/sys/fs/nfsclient/nfs_clport.c
==============================================================================
--- head/sys/fs/nfsclient/nfs_clport.c	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfsclient/nfs_clport.c	Tue Jun 12 19:36:32 2018	(r335012)
@@ -86,6 +86,7 @@ extern int nfs_numnfscbd;
 extern int nfscl_inited;
 struct mtx ncl_iod_mutex;
 NFSDLOCKMUTEX;
+extern struct mtx nfsrv_dslock_mtx;
 
 extern void (*ncl_call_invalcaches)(struct vnode *);
 
@@ -930,7 +931,7 @@ nfscl_fillsattr(struct nfsrv_descript *nd, struct vatt
 		if (vap->va_mtime.tv_sec != VNOVAL)
 			NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_TIMEMODIFYSET);
 		(void) nfsv4_fillattr(nd, vp->v_mount, vp, NULL, vap, NULL, 0,
-		    &attrbits, NULL, NULL, 0, 0, 0, 0, (uint64_t)0);
+		    &attrbits, NULL, NULL, 0, 0, 0, 0, (uint64_t)0, NULL);
 		break;
 	}
 }
@@ -1383,6 +1384,13 @@ nfssvc_nfscl(struct thread *td, struct nfssvc_args *ua
 				    0 && strcmp(mp->mnt_stat.f_fstypename,
 				    "nfs") == 0 && mp->mnt_data != NULL) {
 					nmp = VFSTONFS(mp);
+					NFSDDSLOCK();
+					if (nfsv4_findmirror(nmp) != NULL) {
+						NFSDDSUNLOCK();
+						error = ENXIO;
+						nmp = NULL;
+						break;
+					}
 					mtx_lock(&nmp->nm_mtx);
 					if ((nmp->nm_privflag &
 					    NFSMNTP_FORCEDISM) == 0) {
@@ -1394,6 +1402,7 @@ nfssvc_nfscl(struct thread *td, struct nfssvc_args *ua
 						mtx_unlock(&nmp->nm_mtx);
 						nmp = NULL;
 					}
+					NFSDDSUNLOCK();
 					break;
 				}
 			}
@@ -1418,7 +1427,7 @@ nfssvc_nfscl(struct thread *td, struct nfssvc_args *ua
 				nmp->nm_privflag &= ~NFSMNTP_CANCELRPCS;
 				wakeup(nmp);
 				mtx_unlock(&nmp->nm_mtx);
-			} else
+			} else if (error == 0)
 				error = EINVAL;
 		}
 		free(buf, M_TEMP);

Modified: head/sys/fs/nfsclient/nfs_clrpcops.c
==============================================================================
--- head/sys/fs/nfsclient/nfs_clrpcops.c	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfsclient/nfs_clrpcops.c	Tue Jun 12 19:36:32 2018	(r335012)
@@ -4620,7 +4620,7 @@ nfsrpc_setaclrpc(vnode_t vp, struct ucred *cred, NFSPR
 	NFSZERO_ATTRBIT(&attrbits);
 	NFSSETBIT_ATTRBIT(&attrbits, NFSATTRBIT_ACL);
 	(void) nfsv4_fillattr(nd, vnode_mount(vp), vp, aclp, NULL, NULL, 0,
-	    &attrbits, NULL, NULL, 0, 0, 0, 0, (uint64_t)0);
+	    &attrbits, NULL, NULL, 0, 0, 0, 0, (uint64_t)0, NULL);
 	error = nfscl_request(nd, vp, p, cred, stuff);
 	if (error)
 		return (error);

Modified: head/sys/fs/nfsclient/nfs_clstate.c
==============================================================================
--- head/sys/fs/nfsclient/nfs_clstate.c	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfsclient/nfs_clstate.c	Tue Jun 12 19:36:32 2018	(r335012)
@@ -3373,7 +3373,7 @@ nfscl_docb(struct nfsrv_descript *nd, NFSPROC_T *p)
 			if (!error)
 				(void) nfsv4_fillattr(nd, NULL, NULL, NULL, &va,
 				    NULL, 0, &rattrbits, NULL, p, 0, 0, 0, 0,
-				    (uint64_t)0);
+				    (uint64_t)0, NULL);
 			break;
 		case NFSV4OP_CBRECALL:
 			NFSCL_DEBUG(4, "cbrecall\n");

Modified: head/sys/fs/nfsclient/nfs_clvfsops.c
==============================================================================
--- head/sys/fs/nfsclient/nfs_clvfsops.c	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfsclient/nfs_clvfsops.c	Tue Jun 12 19:36:32 2018	(r335012)
@@ -86,6 +86,7 @@ extern enum nfsiod_state ncl_iodwant[NFS_MAXASYNCDAEMO
 extern struct nfsmount *ncl_iodmount[NFS_MAXASYNCDAEMON];
 extern struct mtx ncl_iod_mutex;
 NFSCLSTATEMUTEX;
+extern struct mtx nfsrv_dslock_mtx;
 
 MALLOC_DEFINE(M_NEWNFSREQ, "newnfsclient_req", "NFS request header");
 MALLOC_DEFINE(M_NEWNFSMNT, "newnfsmnt", "NFS mount struct");
@@ -1672,6 +1673,7 @@ nfs_unmount(struct mount *mp, int mntflags)
 	if (mntflags & MNT_FORCE)
 		flags |= FORCECLOSE;
 	nmp = VFSTONFS(mp);
+	error = 0;
 	/*
 	 * Goes something like this..
 	 * - Call vflush() to clear out vnodes for this filesystem
@@ -1680,6 +1682,12 @@ nfs_unmount(struct mount *mp, int mntflags)
 	 */
 	/* In the forced case, cancel any outstanding requests. */
 	if (mntflags & MNT_FORCE) {
+		NFSDDSLOCK();
+		if (nfsv4_findmirror(nmp) != NULL)
+			error = ENXIO;
+		NFSDDSUNLOCK();
+		if (error)
+			goto out;
 		error = newnfs_nmcancelreqs(nmp);
 		if (error)
 			goto out;

Modified: head/sys/fs/nfsserver/nfs_nfsdkrpc.c
==============================================================================
--- head/sys/fs/nfsserver/nfs_nfsdkrpc.c	Tue Jun 12 19:26:25 2018	(r335011)
+++ head/sys/fs/nfsserver/nfs_nfsdkrpc.c	Tue Jun 12 19:36:32 2018	(r335012)
@@ -105,6 +105,7 @@ static int nfs_proc(struct nfsrv_descript *, u_int32_t
 extern u_long sb_max_adj;
 extern int newnfs_numnfsd;
 extern struct proc *nfsd_master_proc;
+extern time_t nfsdev_time;
 
 /*
  * NFS server system calls
@@ -495,6 +496,7 @@ nfsrvd_nfsd(struct thread *td, struct nfsd_nfsd_args *
 	 */
 	NFSD_LOCK();
 	if (newnfs_numnfsd == 0) {
+		nfsdev_time = time_second;
 		p = td->td_proc;
 		PROC_LOCK(p);
 		p->p_flag2 |= P2_AST_SU;
@@ -502,31 +504,36 @@ nfsrvd_nfsd(struct thread *td, struct nfsd_nfsd_args *
 		newnfs_numnfsd++;
 
 		NFSD_UNLOCK();
-
-		/* An empty string implies AUTH_SYS only. */
-		if (principal[0] != '\0') {
-			ret2 = rpc_gss_set_svc_name_call(principal,
-			    "kerberosv5", GSS_C_INDEFINITE, NFS_PROG, NFS_VER2);
-			ret3 = rpc_gss_set_svc_name_call(principal,
-			    "kerberosv5", GSS_C_INDEFINITE, NFS_PROG, NFS_VER3);
-			ret4 = rpc_gss_set_svc_name_call(principal,
-			    "kerberosv5", GSS_C_INDEFINITE, NFS_PROG, NFS_VER4);
-
-			if (!ret2 || !ret3 || !ret4)
-				printf("nfsd: can't register svc name\n");
+		error = nfsrv_createdevids(args, td);
+		if (error == 0) {
+			/* An empty string implies AUTH_SYS only. */
+			if (principal[0] != '\0') {
+				ret2 = rpc_gss_set_svc_name_call(principal,
+				    "kerberosv5", GSS_C_INDEFINITE, NFS_PROG,
+				    NFS_VER2);
+				ret3 = rpc_gss_set_svc_name_call(principal,
+				    "kerberosv5", GSS_C_INDEFINITE, NFS_PROG,
+				    NFS_VER3);
+				ret4 = rpc_gss_set_svc_name_call(principal,
+				    "kerberosv5", GSS_C_INDEFINITE, NFS_PROG,
+				    NFS_VER4);
+	
+				if (!ret2 || !ret3 || !ret4)
+					printf(
+					    "nfsd: can't register svc name\n");
+			}
+	
+			nfsrvd_pool->sp_minthreads = args->minthreads;
+			nfsrvd_pool->sp_maxthreads = args->maxthreads;
+				
+			svc_run(nfsrvd_pool);
+	
+			if (principal[0] != '\0') {
+				rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER2);
+				rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER3);
+				rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER4);
+			}
 		}
-
-		nfsrvd_pool->sp_minthreads = args->minthreads;
-		nfsrvd_pool->sp_maxthreads = args->maxthreads;
-			
-		svc_run(nfsrvd_pool);
-
-		if (principal[0] != '\0') {
-			rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER2);
-			rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER3);
-			rpc_gss_clear_svc_name_call(NFS_PROG, NFS_VER4);
-		}
-
 		NFSD_LOCK();
 		newnfs_numnfsd--;
 		nfsrvd_init(1);
@@ -555,6 +562,7 @@ nfsrvd_init(int terminating)
 	if (terminating) {
 		nfsd_master_proc = NULL;
 		NFSD_UNLOCK();
+		nfsrv_freealllayoutsanddevids();

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201806121936.w5CJaXFs086620>