From owner-freebsd-amd64@FreeBSD.ORG Wed Oct 12 13:29:57 2011 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB60B106566C; Wed, 12 Oct 2011 13:29:57 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7076E8FC19; Wed, 12 Oct 2011 13:29:57 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id C712F46B32; Wed, 12 Oct 2011 09:29:56 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 4B3818A02E; Wed, 12 Oct 2011 09:29:56 -0400 (EDT) From: John Baldwin To: freebsd-amd64@freebsd.org Date: Wed, 12 Oct 2011 09:29:39 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <201110111507.p9BF7Dqw036762@red.freebsd.org> In-Reply-To: <201110111507.p9BF7Dqw036762@red.freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201110120929.39901.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Wed, 12 Oct 2011 09:29:56 -0400 (EDT) Cc: Rick Macklem , freebsd-gnats-submit@freebsd.org, George Breahna Subject: Re: amd64/161493: NFS v3 directory structure update slow X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Oct 2011 13:29:57 -0000 On Tuesday, October 11, 2011 11:07:13 am George Breahna wrote: > > >Number: 161493 > >Category: amd64 > >Synopsis: NFS v3 directory structure update slow > >Confidential: no > >Severity: critical > >Priority: high > >Responsible: freebsd-amd64 > >State: open > >Quarter: > >Keywords: > >Date-Required: > >Class: sw-bug > >Submitter-Id: current-users > >Arrival-Date: Tue Oct 11 15:10:07 UTC 2011 > >Closed-Date: > >Last-Modified: > >Originator: George Breahna > >Release: 9.0 Beta 2 > >Organization: > >Environment: > FreeBSD store2 9.0-BETA2 FreeBSD 9.0-BETA2 #0: Sun Sep 18 22:02:45 EDT 2011 pulsar@store2.emailarray.com:/usr/obj/usr/src/sys/PULSAR amd64 > >Description: > We used to run a NFS server on FreeBSD 6.2 but we built a new box recently and installed 9.0 Beta 2 on it. The data was moved over as it serves as the back-end for a mail system. It runs NFS v3 over TCP only and all the NFS- related processes (rpcbind, mountd, lockd, etc ) run with the -h switch and bind to the local IP address. > > The NFS server exports the data to 7 NFS clients ranging from FreeBSD 6.1 to 8.2, the majority being 8.2 The mount on the NFS clients is done simply with - o tcp,rsize=32768,wsize=32768 > > Usual file operations, such as accessing files, creating directories, removing files, chmod, chown, etc work perfectly but we noticed there were issues in removing directories that contained data. We had a strange error: > > rm -rf nick/ > rm: fts_read: Input/output error > > Using 'truss' on rm revealed this: > > open("..",O_RDONLY,00) ERR#5 'Input/output error' > > After much testing and debugging we realized the problem is in the NFS protocol. ( either server or client but we assume server since this used to work very well with FreeBSD 6.2 ). The problem appears to be that NFS does not show the '..' after modifying a directory structure. Take the following example executed on a FreeBSD 8.2 client accessing the NFS share from the 9.0B2 server: > > imap5# mkdir test1 > imap5# cd test1 > imap5# touch file1 > imap5# touch file2 > imap5# ls -la > ls: ..: Input/output error > total 4 > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:55 . > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file1 > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file2 > > Notice the '..' is missing from the display. If we now try and remove the directory 'test1' it will throw the "rm: fts_read: Input/output error" error. > > If we wait in between 1 minute and 5 minutes, '..' will eventually appear by itself. During this whole time, '..' effectively exists on the NFS server but it's not displayed by any of the NFS clients. > > I can force the NFS client to show it faster by doing an ls -la from the parent level. For example: > > imap5# mkdir test1 > imap5# touch test1/file1 > imap5# touch test1/file2 > imap5# touch test1/file3 > imap5# ls -la test1 > total 8 > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 . > drwx------ 10 vpopmail vchkpw 1024 Oct 11 10:59 .. > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1 > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2 > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3 > imap5# cd test1 > imap5# ls -la > total 8 > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 . > drwx------ 10 vpopmail vchkpw 1024 Oct 11 10:59 .. > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1 > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2 > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3 > > but if we wait 5 seconds after that display and try again: > > ls -la > ls: ..: Input/output error > total 4 > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 . > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1 > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2 > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3 > > Again, if we wait longer ( 1-5 minutes ), the '..' will properly appear in there. > > There are no error messages on the console or other log files. This is reproducible 100% of the time with any FreeBSD client. Have tried unmounting/remounting several times without any effect. Also tried different rsize/wsize, no effect. I think there is some delay in updating the directory structure and it's causing this bug. > > Here's also some output from nfsstat on the server: > > > Server Info: > Getattr Setattr Lookup Readlink Read Write Create Remove > 114731225 20496896 254966151 133 11697392 19963641 0 9228861 > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access > 4313471 1157651 39 1955 16511932 15479669 0 116927742 > Mknod Fsstat Fsinfo PathConf Commit > 0 4748487 48 0 14921747 > Server Ret-Failed > 0 > Server Faults > 0 > Server Cache Stats: > Inprog Idem Non-idem Misses > 0 0 0 613368147 > Server Write Gathering: > WriteOps WriteRPC Opsaved > 19963641 19963641 0 > > >How-To-Repeat: > imap5# mkdir test1 > imap5# cd test1 > imap5# touch file1 > imap5# touch file2 > imap5# ls -la > ls: ..: Input/output error > total 4 > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:55 . > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file1 > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file2 > >Fix: Can you try using the "old" NFS server as a test? -- John Baldwin