From owner-freebsd-fs@FreeBSD.ORG Tue Apr 24 12:17:02 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C26FA1065670 for ; Tue, 24 Apr 2012 12:17:02 +0000 (UTC) (envelope-from simon@comsys.ntu-kpi.kiev.ua) Received: from comsys.kpi.ua (comsys.kpi.ua [77.47.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id 3F8C88FC1B for ; Tue, 24 Apr 2012 12:17:01 +0000 (UTC) Received: from pm513-1.comsys.kpi.ua ([10.18.52.101] helo=pm513-1.comsys.ntu-kpi.kiev.ua) by comsys.kpi.ua with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1SMefz-0007EK-Jv; Tue, 24 Apr 2012 15:16:59 +0300 Received: by pm513-1.comsys.ntu-kpi.kiev.ua (Postfix, from userid 1001) id 2BCF11CC31; Tue, 24 Apr 2012 15:17:00 +0300 (EEST) Date: Tue, 24 Apr 2012 15:16:59 +0300 From: Andrey Simonenko To: Garrett Wollman Message-ID: <20120424121659.GA11025@pm513-1.comsys.ntu-kpi.kiev.ua> References: <20302.54862.344852.13627@hergotha.csail.mit.edu> <605429676.158764.1330571627121.JavaMail.root@erie.cs.uoguelph.ca> <20303.45967.708688.414986@hergotha.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20303.45967.708688.414986@hergotha.csail.mit.edu> User-Agent: Mutt/1.5.21 (2010-09-15) X-Authenticated-User: simon@comsys.ntu-kpi.kiev.ua X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.63 (build at 28-Apr-2011 07:11:12) X-Date: 2012-04-24 15:16:59 X-Connected-IP: 10.18.52.101:56010 X-Message-Linecount: 109 X-Body-Linecount: 92 X-Message-Size: 6048 X-Body-Size: 5202 Cc: freebsd-fs@freebsd.org Subject: Re: Under what circumstances does the new NFS client return EAGAIN? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Apr 2012 12:17:02 -0000 On Thu, Mar 01, 2012 at 12:36:15PM -0500, Garrett Wollman wrote: > < said: > > > Unfortunately it is a well known issue that updating exports > > is not done atomically. (I had a patch that suspended the nfsd > > threads while exports were being updated, but it was felt to > > be risky and zack@ was going to come up with a patch to fix this, > > but I don't think he has committed anything.) > > That might be something that we at least would need. You don't need > to suspend all of the nfsd threads, just delay responding to any > request that fails access control until the filter programming is > done. We may actually need to do something like that, if this machine > is to be usable as a file server. (Can't have our users' jobs > randomly breaking just because an administrator mounted a new > filesystem.) There are two types of NFS export settings handling in NFS servers: 1. All NFS export settings are loaded into NFS server, so it can make decisions about exports itself. All address specifications are given by addresses and netmasks (does not matter whether they were given by explicit addresses or by domain names in configuration files). 2. All NFS export settings are kept in user space. NFS server has cache of settings for clients' addresses and asks user space program if cache does not have NFS export information for some client's address. Such approach allows to specify export as wildcard domain names. When export settings are updated, for the first case it is necessary to update them atomically, so the NFS server will not see partially loaded settings. For the second case user land utility can synchronize own vision of NFS settings, so it just need to flush NFS server export settings cache. FreeBSD uses the first type. I already heard about suspending NFS server threads in the kernel, while NFS exports settings are being parsed and loaded. Such approach has few drawbacks: 1) user land program that loads settings can crash, 2) time for loading settings into NFS server is undefined, since data can be not in RAM. So, time while NFS server threads are suspended is undefined. There are other problems, I'll not describe them here just to be brief. Now I want to describe how NFS export settings are loaded into NFS server in my implementation. To load export settings into NFS server the nfssvc(NFSSVC_EXPORT) system call is used. Settings are not passed in one system call, so it is not necessary to create one buffer with all settings (settings can be given as a linked list as well). All communications with the NFS server through nfssvc(NFSSVC_EXPORT) system call are made in transaction concept. A process asks NFS server to start a new transaction, NFS server creates transaction and informs a process about transaction ID. Each transaction is identified by PID, UID and transaction ID, so several processes can modify NFS server export settings at the same time. Each transaction has timeout, to simplify implementation (because only one transaction is expected) any transaction can be in BUSY, ACTIVE or INACTIVE state. One callout with one timeout for all transactions is used. If transaction is inactive for some period of time its context is released, if a process that works with this transaction is still uses it, then it will be notified about transaction disappearing and will start a new transaction. When process loaded all settings into NFS server it performs transaction commit command and all settings saved in transaction context atomically are applied to NFS server export settings. NFS export settings are protected by r/w locks (rmlock for example). While NFS exports settings are being loaded, NFS server verifies whether they can be applied to the current configuration, so when transaction is committed, all data structures are already ready and should be just applied to the current configuration. To minimize number of nfssvc() system calls, a process can combine transaction flags and can chose how many different settings should be send by one nfssvc() system call. To make ABI interface with NFS server more flexible for future changes, all settings are passed in so called command structures and all export specifications are not hard coded in that structures, instead they are passed by variable sized arrays, so number and types of specifications can be changed without changing ABI. When file system is mounted it is not necessary to flush and then load all settings again, instead only settings for just mounted file system should be loaded. The same logic for file system that is going to be unmounted. Using SIGHUP signal from mount(8) is wrong, since it will work only in some cases and it will not work at all if file systems are mounted by another program. I used EVFILT_FS VQ_MOUNT and VQ_UNMOUNT kevents for user land part and new vfs_mount_event and vfs_unmount_event EVENTHANDLERS for the kernel part. The kernel part never relies on information about file system export from user space. Consider cases when one file system that shadows NFS exported file systems is mounted or unmounted.