From owner-freebsd-amd64@FreeBSD.ORG Thu Oct 13 01:42:31 2011 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 43DE51065674 for ; Thu, 13 Oct 2011 01:42:31 +0000 (UTC) (envelope-from george@polarismail.com) Received: from smtp3.emailarray.com (smtp3.emailarray.com [65.39.216.17]) by mx1.freebsd.org (Postfix) with ESMTP id DC0BA8FC0C for ; Thu, 13 Oct 2011 01:42:30 +0000 (UTC) Received: (qmail 93093 invoked by uid 89); 13 Oct 2011 01:15:50 -0000 Received: from unknown (HELO GeorgePC) (sheken@top-consulting.net@50.100.137.136) (POLARISLOCAL) by smtp3.emailarray.com with SMTP; 13 Oct 2011 01:15:48 -0000 From: "George Breahna" To: "'Rick Macklem'" , "'John Baldwin'" References: <201110120929.39901.jhb@freebsd.org> <1214713861.3007306.1318465511106.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <1214713861.3007306.1318465511106.JavaMail.root@erie.cs.uoguelph.ca> Date: Wed, 12 Oct 2011 21:15:45 -0400 Message-ID: <014f01cc8945$a2f648e0$e8e2daa0$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcyJPpNZg2o+f8zUTIGdaGTjmdep/wABtocQ Content-Language: en-us X-DSPAM-Result: Innocent X-DSPAM-Processed: Wed Oct 12 21:15:50 2011 X-DSPAM-Confidence: 0.9952 X-DSPAM-Improbability: 1 in 20716 chance of being spam X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 1,4e963bc611849563212579 X-DSPAM-Factors: 27, Url*freebsd, 0.00170, Url*freebsd, 0.00170, freebsd+org, 0.00170, freebsd+org, 0.00170, Breahna, 0.00242, Breahna, 0.00242, freebsd, 0.00340, freebsd, 0.00340, To+John, 0.00351, UTC+2011, 0.00356, FreeBSD, 0.00359, FreeBSD, 0.00359, no+>, 0.00382, Ok+I, 0.00406, but+>, 0.00490, you+wrote, 0.00577, From+Rick, 0.00629, 2011+8, 0.00645, >+Can, 0.00650, George+Breahna, 0.00678, George+Breahna, 0.00678, there+>, 0.00785, there+>, 0.00785, error+>, 0.00796, error+>, 0.00796, la+>, 0.00813, la+>, 0.00813 X-PolarisMail-Flags: x X-Mailman-Approved-At: Thu, 13 Oct 2011 15:53:02 +0000 Cc: freebsd-amd64@freebsd.org Subject: RE: amd64/161493: NFS v3 directory structure update slow X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Oct 2011 01:42:31 -0000 Ok, I will try this. I noticed you wrote another patch, available here, called the dotdot = patch. It modifies another file on top of the one mentioned in the link = you gave me. Is that unnecessary now ? http://people.freebsd.org/~rmacklem/dotdot.patch George -----Original Message----- From: Rick Macklem [mailto:rmacklem@uoguelph.ca]=20 Sent: Wednesday, October 12, 2011 8:25 PM To: John Baldwin Cc: George Breahna; freebsd-gnats-submit@freebsd.org; Rick Macklem; = freebsd-amd64@freebsd.org Subject: Re: amd64/161493: NFS v3 directory structure update slow John Baldwin wrote: > On Tuesday, October 11, 2011 11:07:13 am George Breahna wrote: > > > > >Number: 161493 > > >Category: amd64 > > >Synopsis: NFS v3 directory structure update slow > > >Confidential: no > > >Severity: critical > > >Priority: high > > >Responsible: freebsd-amd64 > > >State: open > > >Quarter: > > >Keywords: > > >Date-Required: > > >Class: sw-bug > > >Submitter-Id: current-users > > >Arrival-Date: Tue Oct 11 15:10:07 UTC 2011 > > >Closed-Date: > > >Last-Modified: > > >Originator: George Breahna > > >Release: 9.0 Beta 2 > > >Organization: > > >Environment: > > FreeBSD store2 9.0-BETA2 FreeBSD 9.0-BETA2 #0: Sun Sep 18 22:02:45 > > EDT 2011 > pulsar@store2.emailarray.com:/usr/obj/usr/src/sys/PULSAR amd64 > > >Description: > > We used to run a NFS server on FreeBSD 6.2 but we built a new box > > recently > and installed 9.0 Beta 2 on it. The data was moved over as it serves > as the > back-end for a mail system. It runs NFS v3 over TCP only and all the > NFS- > related processes (rpcbind, mountd, lockd, etc ) run with the -h > switch and > bind to the local IP address. > > > > The NFS server exports the data to 7 NFS clients ranging from > > FreeBSD 6.1 to > 8.2, the majority being 8.2 The mount on the NFS clients is done > simply with - > o tcp,rsize=3D32768,wsize=3D32768 > > > > Usual file operations, such as accessing files, creating > > directories, > removing files, chmod, chown, etc work perfectly but we noticed there > were > issues in removing directories that contained data. We had a strange > error: > > > > rm -rf nick/ > > rm: fts_read: Input/output error > > > > Using 'truss' on rm revealed this: > > > > open("..",O_RDONLY,00) ERR#5 'Input/output error' > > > > After much testing and debugging we realized the problem is in the > > NFS > protocol. ( either server or client but we assume server since this > used to > work very well with FreeBSD 6.2 ). The problem appears to be that NFS > does not > show the '..' after modifying a directory structure. Take the > following > example executed on a FreeBSD 8.2 client accessing the NFS share from > the > 9.0B2 server: > > > > imap5# mkdir test1 > > imap5# cd test1 > > imap5# touch file1 > > imap5# touch file2 > > imap5# ls -la > > ls: ..: Input/output error > > total 4 > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:55 . > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file1 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file2 > > > > Notice the '..' is missing from the display. If we now try and > > remove the > directory 'test1' it will throw the "rm: fts_read: Input/output error" > error. > > > > If we wait in between 1 minute and 5 minutes, '..' will eventually > > appear by > itself. During this whole time, '..' effectively exists on the NFS > server but > it's not displayed by any of the NFS clients. > > > > I can force the NFS client to show it faster by doing an ls -la from > > the > parent level. For example: > > > > imap5# mkdir test1 > > imap5# touch test1/file1 > > imap5# touch test1/file2 > > imap5# touch test1/file3 > > imap5# ls -la test1 > > total 8 > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 . > > drwx------ 10 vpopmail vchkpw 1024 Oct 11 10:59 .. > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3 > > imap5# cd test1 > > imap5# ls -la > > total 8 > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 . > > drwx------ 10 vpopmail vchkpw 1024 Oct 11 10:59 .. > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3 > > > > but if we wait 5 seconds after that display and try again: > > > > ls -la > > ls: ..: Input/output error > > total 4 > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 . > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3 > > > > Again, if we wait longer ( 1-5 minutes ), the '..' will properly > > appear in > there. > > > > There are no error messages on the console or other log files. This > > is > reproducible 100% of the time with any FreeBSD client. Have tried > unmounting/remounting several times without any effect. Also tried > different > rsize/wsize, no effect. I think there is some delay in updating the > directory > structure and it's causing this bug. > > > > Here's also some output from nfsstat on the server: > > > > > > Server Info: > > Getattr Setattr Lookup Readlink Read Write Create > Remove > > 114731225 20496896 254966151 133 11697392 19963641 0 > 9228861 > > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus > Access > > 4313471 1157651 39 1955 16511932 15479669 0 > 116927742 > > Mknod Fsstat Fsinfo PathConf Commit > > 0 4748487 48 0 14921747 > > Server Ret-Failed > > 0 > > Server Faults > > 0 > > Server Cache Stats: > > Inprog Idem Non-idem Misses > > 0 0 0 613368147 > > Server Write Gathering: > > WriteOps WriteRPC Opsaved > > 19963641 19963641 0 > > > > >How-To-Repeat: > > imap5# mkdir test1 > > imap5# cd test1 > > imap5# touch file1 > > imap5# touch file2 > > imap5# ls -la > > ls: ..: Input/output error > > total 4 > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:55 . > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file1 > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file2 > > >Fix: >=20 > Can you try using the "old" NFS server as a test? >=20 Please make sure you have the patch in r225356 in your server's kernel sources (it went into head on Sep. 3, but I don't know if your Sep. 11 build would have it?). It fixed a problem that would cause lookup of ".." to fail intermittently, because a field in struct nameidata added on Aug. 13 wasn't initialized. You can find the one line patch here: = http://svnweb.freebsd.org/base/head/sys/fs/nfsserver/nfs_nfsdport.c?r1=3D= 224911&r2=3D225356 Please let us know if you have this patch and, if not, apply it and see if the problem goes away. Thanks, rick