Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Jan 2013 15:57:03 +0200
From:      Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: Problems Re-Starting mountd
Message-ID:  <20130109135703.GB1574@pm513-1.comsys.ntu-kpi.kiev.ua>
In-Reply-To: <972459831.1800222.1357690721032.JavaMail.root@erie.cs.uoguelph.ca>
References:  <50EC39A8.3070108@cse.yorku.ca> <972459831.1800222.1357690721032.JavaMail.root@erie.cs.uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 08, 2013 at 07:18:41PM -0500, Rick Macklem wrote:
> Jason Keltz wrote:
> > On 01/08/2013 10:05 AM, Andrey Simonenko wrote:
> > > I created 2000 file systems on ZFS file system backed by vnode md(4)
> > > device. The /etc/exports file contains 4000 entries like your
> > > example.
> > >
> > > On 9.1-STABLE mountd spends ~70 seconds in flushing current NFS
> > > exports
> > > in the NFS server, parsing data from /etc/exports and loading parsed
> > > data into the NFS server. ~70 seconds is not several minutes. Most
> > > of
> > > time mountd spends in nmount() system call in "zio->io_cv" lock.
> > >
> > > Can you show the output of "truss -fc -o /tmp/output.txt mountd"
> > > (wait wchan "select" state of mountd and terminate it by a signal).
> > > If everything is correct you should see N statfs() calls, N+M
> > > nmount()
> > > calls and something*N lstat() calls, where N is the number of
> > > /etc/exports
> > > lines, M is the number of mounted file systems. Number of lstat()
> > > calls
> > > depends on number of components in pathnames.
> > 
> > Andrey,
> > 
> > Would that still be an ~70 second period in which new mounts would not
> > be allowed? In the system I'm preparing, I'll have at least 4000
> > entries
> > in /etc/exports, probably even more, so I know I'll be dealing with
> > the
> > same issue that Tim is dealing with when I get there. However, I don't
> > see how to avoid the issue ... If I want new users to be able to login
> > shortly after their account is created, and each user has a ZFS
> > filesystem as a home directory, then at least at some interval, after
> > adding a user to the system, I need to update the exports file on the
> > file server, and re-export everything. Yet, even a >1 minute delay
> > where users who are logging in won't get their home directory mounted
> > on
> > the system they are logging into - well, that's not so good...
> > accounts
> > can be added all the time and this would create random chaos. Isn't
> > there some way to make it so that when you re-export everything, the
> > existing exports are still served until the new exports are ready?
> I can't think of how you'd do everything without deleting the old stuff,
> but it would be possible to "add new entries". It has to be done by
> modifying mountd, since it keeps a tree in its address space that it uses
> for mount requests and the tree must be grown.
> 
> I don't know about nfse, but you'd have to add this capability to mountd
> and, trust me, it's an ugly old piece of C code, so coming up with a patch
> might not be that easy. However, it might not be that bad, since the only difference
> from doing the full reload as it stands now would be to "not delete
> the tree that already exists in the utility and don't do the DELEXPORTS
> syscall" I think, so the old ones don't go away. There could be a file called
> something like /etc/exports.new for the new entries and a different
> signal (SIGUSR1??) to load these. (Then you'd add the new entries to
> /etc/exports as well for the next time mountd restarts, but wouldn't
> send it a SIGHUP.)

This delay in above described example came from ZFS kernel code, since
the same configuration for 2000 nullfs(5) file systems takes ~0.20 second
(less than second) by mountd in nmount() system calls.  At least on
9.1-STABLE I do not see that this delay came from mountd code, it came
from nmount() used by mountd.

Since nfse was mentioned in this thread, I can explain how this is
implemented in nfse.

The nfse utility and its NFSE API support dynamic commands, in fact all
settings are updated using the same API.  This API allows to flush all
configuration, flush/clear file system configuration, add/update/delete
configuration for address specification.  All commands can be grouped,
so one nfssvc() call can be called with several commands.  Not all commands
have to be grouped together, instead API uses transaction model and while
some transaction is open it is possible to use it for passing commands
into NFS server.  When all commands are ready, transaction is committed.
Each transaction has timeout and it is possible to have several transaction
in one or in several processes.

The nfse utility has the -c option that allows to specify commands in
a command line.  For example a user can add several lines in configuration
file:

/fs/user1 -network 10.1/16 -network 10.2/16
/fs/user2 -network 10.3/16 1.1.1.1

Then, instead of reloading all configuration one can add these settings:

# nfse -c 'add /fs/user1 -network 10.1/16 -network 10.2/16' \
       -c 'add /fs/user2 -network 10.3/16 1.1.1.1'

Or, it is possible to keep settings for each user in separate file,
eg. /etc/nfs-export/user1, /etc/nfs-export/user2 files and then:

# nfse -c 'add -f /etc/nfs-export/user1' -c 'add -f /etc/nfs-export/user2'

Since number of users can be change nfse should be started like this:

# nfse /etc/nfs-export/

and it will take all regular files from the given directory(ies).

When it is necessary to remove NFS exports for user then:

# nfse -c 'delete /fs/user1 -network 10.1/16 -network 10.2/16' \
       -c 'delete /fs/user2 -network 10.3/16 1.1.1.1'

or

# nfse -c 'delete -f /etc/nfs-export/user1' \
       -c 'delete -f /etc/nfs-export/user2'

or

# nfse -c 'flush /fs/user1 /fs/user2'

Updating work like this:

# nfse -c 'update /fs/user1 -ro -network 10.1/16'

I checked nfse on 9.1-STABLE with above given example.  It takes ~0.10
second by nfse to configure 2000 ZFS file systems, this time mostly
is spent in nfssvc() calls (number of calls depends on how many commands
are grouped for one nfssvc() call).

I did not check delay in NFSE code for NFS clients during updating of
NFS export settings, but it will be less than time used by nfse, since
NFSE code in the NFS server uses deferred data releasing and it require
to acquire small number of locks.  Two locks are acquire while all NFS
export settings are updated, one lock is acquire for transaction and one
lock is acquire for each passed security flavor list and credentials
specification.  Each security flavor list and credential specification or
any specification is passed in own command, so if there are ~2000 file
systems exported to the same address specification, then corresponding
security flavor list and credential specification are passed only one time.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130109135703.GB1574>