From owner-freebsd-fs@FreeBSD.ORG Wed Jan 9 18:36:48 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 739BE83E for ; Wed, 9 Jan 2013 18:36:48 +0000 (UTC) (envelope-from jas@cse.yorku.ca) Received: from bronze.cs.yorku.ca (bronze.cs.yorku.ca [130.63.95.34]) by mx1.freebsd.org (Postfix) with ESMTP id 38FC7402 for ; Wed, 9 Jan 2013 18:36:48 +0000 (UTC) Received: from [130.63.97.125] (ident=jas) by bronze.cs.yorku.ca with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76) (envelope-from ) id 1Tt0W1-0008W6-R1; Wed, 09 Jan 2013 13:36:42 -0500 Message-ID: <50EDB8B9.4030507@cse.yorku.ca> Date: Wed, 09 Jan 2013 13:36:41 -0500 From: Jason Keltz User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Andrey Simonenko Subject: Re: Problems Re-Starting mountd References: <50EC39A8.3070108@cse.yorku.ca> <972459831.1800222.1357690721032.JavaMail.root@erie.cs.uoguelph.ca> <20130109135703.GB1574@pm513-1.comsys.ntu-kpi.kiev.ua> In-Reply-To: <20130109135703.GB1574@pm513-1.comsys.ntu-kpi.kiev.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Spam-Level: - X-Spam-Report: Content preview: On 01/09/2013 08:57 AM, Andrey Simonenko wrote: > On Tue, Jan 08, 2013 at 07:18:41PM -0500, Rick Macklem wrote: >> Jason Keltz wrote: >>> On 01/08/2013 10:05 AM, Andrey Simonenko wrote: >>>> I created 2000 file systems on ZFS file system backed by vnode md(4) >>>> device. The /etc/exports file contains 4000 entries like your >>>> example. >>>> >>>> On 9.1-STABLE mountd spends ~70 seconds in flushing current NFS >>>> exports >>>> in the NFS server, parsing data from /etc/exports and loading parsed >>>> data into the NFS server. ~70 seconds is not several minutes. Most >>>> of >>>> time mountd spends in nmount() system call in "zio->io_cv" lock. >>>> >>>> Can you show the output of "truss -fc -o /tmp/output.txt mountd" >>>> (wait wchan "select" state of mountd and terminate it by a signal). >>>> If everything is correct you should see N statfs() calls, N+M >>>> nmount() >>>> calls and something*N lstat() calls, where N is the number of >>>> /etc/exports >>>> lines, M is the number of mounted file systems. Number of lstat() >>>> calls >>>> depends on number of components in pathnames. >>> Andrey, >>> >>> Would that still be an ~70 second period in which new mounts would not >>> be allowed? In the system I'm preparing, I'll have at least 4000 >>> entries >>> in /etc/exports, probably even more, so I know I'll be dealing with >>> the >>> same issue that Tim is dealing with when I get there. However, I don't >>> see how to avoid the issue ... If I want new users to be able to login >>> shortly after their account is created, and each user has a ZFS >>> filesystem as a home directory, then at least at some interval, after >>> adding a user to the system, I need to update the exports file on the >>> file server, and re-export everything. Yet, even a >1 minute delay >>> where users who are logging in won't get their home directory mounted >>> on >>> the system they are logging into - well, that's not so good... >>> accounts >>> can be added all the time and this would create random chaos. Isn't >>> there some way to make [...] Content analysis details: (-1.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SHORTCIRCUIT Not all rules were run, due to a shortcircuited rule -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jan 2013 18:36:48 -0000 On 01/09/2013 08:57 AM, Andrey Simonenko wrote: > On Tue, Jan 08, 2013 at 07:18:41PM -0500, Rick Macklem wrote: >> Jason Keltz wrote: >>> On 01/08/2013 10:05 AM, Andrey Simonenko wrote: >>>> I created 2000 file systems on ZFS file system backed by vnode md(4) >>>> device. The /etc/exports file contains 4000 entries like your >>>> example. >>>> >>>> On 9.1-STABLE mountd spends ~70 seconds in flushing current NFS >>>> exports >>>> in the NFS server, parsing data from /etc/exports and loading parsed >>>> data into the NFS server. ~70 seconds is not several minutes. Most >>>> of >>>> time mountd spends in nmount() system call in "zio->io_cv" lock. >>>> >>>> Can you show the output of "truss -fc -o /tmp/output.txt mountd" >>>> (wait wchan "select" state of mountd and terminate it by a signal). >>>> If everything is correct you should see N statfs() calls, N+M >>>> nmount() >>>> calls and something*N lstat() calls, where N is the number of >>>> /etc/exports >>>> lines, M is the number of mounted file systems. Number of lstat() >>>> calls >>>> depends on number of components in pathnames. >>> Andrey, >>> >>> Would that still be an ~70 second period in which new mounts would not >>> be allowed? In the system I'm preparing, I'll have at least 4000 >>> entries >>> in /etc/exports, probably even more, so I know I'll be dealing with >>> the >>> same issue that Tim is dealing with when I get there. However, I don't >>> see how to avoid the issue ... If I want new users to be able to login >>> shortly after their account is created, and each user has a ZFS >>> filesystem as a home directory, then at least at some interval, after >>> adding a user to the system, I need to update the exports file on the >>> file server, and re-export everything. Yet, even a >1 minute delay >>> where users who are logging in won't get their home directory mounted >>> on >>> the system they are logging into - well, that's not so good... >>> accounts >>> can be added all the time and this would create random chaos. Isn't >>> there some way to make it so that when you re-export everything, the >>> existing exports are still served until the new exports are ready? >> I can't think of how you'd do everything without deleting the old stuff, >> but it would be possible to "add new entries". It has to be done by >> modifying mountd, since it keeps a tree in its address space that it uses >> for mount requests and the tree must be grown. >> >> I don't know about nfse, but you'd have to add this capability to mountd >> and, trust me, it's an ugly old piece of C code, so coming up with a patch >> might not be that easy. However, it might not be that bad, since the only difference >> from doing the full reload as it stands now would be to "not delete >> the tree that already exists in the utility and don't do the DELEXPORTS >> syscall" I think, so the old ones don't go away. There could be a file called >> something like /etc/exports.new for the new entries and a different >> signal (SIGUSR1??) to load these. (Then you'd add the new entries to >> /etc/exports as well for the next time mountd restarts, but wouldn't >> send it a SIGHUP.) > This delay in above described example came from ZFS kernel code, since > the same configuration for 2000 nullfs(5) file systems takes ~0.20 second > (less than second) by mountd in nmount() system calls. At least on > 9.1-STABLE I do not see that this delay came from mountd code, it came > from nmount() used by mountd. > > Since nfse was mentioned in this thread, I can explain how this is > implemented in nfse. > > The nfse utility and its NFSE API support dynamic commands, in fact all > settings are updated using the same API. This API allows to flush all > configuration, flush/clear file system configuration, add/update/delete > configuration for address specification. All commands can be grouped, > so one nfssvc() call can be called with several commands. Not all commands > have to be grouped together, instead API uses transaction model and while > some transaction is open it is possible to use it for passing commands > into NFS server. When all commands are ready, transaction is committed. > Each transaction has timeout and it is possible to have several transaction > in one or in several processes. > > ... > I checked nfse on 9.1-STABLE with above given example. It takes ~0.10 > second by nfse to configure 2000 ZFS file systems, this time mostly > is spent in nfssvc() calls (number of calls depends on how many commands > are grouped for one nfssvc() call). > > I did not check delay in NFSE code for NFS clients during updating of > NFS export settings, but it will be less than time used by nfse, since > NFSE code in the NFS server uses deferred data releasing and it require > to acquire small number of locks. Two locks are acquire while all NFS > export settings are updated, one lock is acquire for transaction and one > lock is acquire for each passed security flavor list and credentials > specification. Each security flavor list and credential specification or > any specification is passed in own command, so if there are ~2000 file > systems exported to the same address specification, then corresponding > security flavor list and credential specification are passed only one time. Thanks for all of the helpful information on nfse. In all fairness, I didn't know what nfse was initially, so I read about it here: http://sourceforge.net/projects/nfse/ (Given the maintainer, Andrey, I can see why you're such an expert in nfse!) :) Will it work under 9.1? or is it still development? Since nfse doesn't use nmount() call (is this correct?), I get the impression that whether it processes the entire export configuration (I realize custom to nfse) or not, I assume that we wouldn't see any delay when using ZFS? Would the solution then be to use nfse or do I still need to wait for it to be stable when maybe 10.0 is released? Maybe without thinking this through too much, since nfse is newer, it would be interesting if there was an option for a "living" "exports" file. With a standard exports file, if you update it live, the changes aren't reprocessed until issuing a command, and then everything is unexported and reexported. What if you could change the exports file on the fly, nfse sees that the file has changed, compares what is in the file to the current exported state, and then acts accordingly to sync the two states by adding or deleting exports... of course, if processing the whole file takes such a short time, maybe this doesn't make any sense.... (and of course if you run out of space and truncate the file accidentally..whoops..) Jason.