From owner-freebsd-amd64@FreeBSD.ORG Thu Oct 13 02:19:25 2011 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 64A0A106568A for ; Thu, 13 Oct 2011 02:19:25 +0000 (UTC) (envelope-from george@polarismail.com) Received: from smtp3.emailarray.com (smtp3.emailarray.com [65.39.216.17]) by mx1.freebsd.org (Postfix) with ESMTP id EF8088FC12 for ; Thu, 13 Oct 2011 02:19:24 +0000 (UTC) Received: (qmail 4041 invoked by uid 89); 13 Oct 2011 02:19:23 -0000 Received: from unknown (HELO GeorgePC) (sheken@top-consulting.net@50.100.137.136) (POLARISLOCAL) by smtp3.emailarray.com with SMTP; 13 Oct 2011 02:19:22 -0000 From: "George Breahna" To: "'Rick Macklem'" References: <014f01cc8945$a2f648e0$e8e2daa0$@com> <287177506.3014184.1318472210023.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <287177506.3014184.1318472210023.JavaMail.root@erie.cs.uoguelph.ca> Date: Wed, 12 Oct 2011 22:19:19 -0400 Message-ID: <016901cc894e$83e5ed80$8bb1c880$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcyJTixDEczJvZLwSB2+lWOv1Ba4TgAACA9A Content-Language: en-us X-DSPAM-Result: Innocent X-DSPAM-Processed: Wed Oct 12 22:19:23 2011 X-DSPAM-Confidence: 0.9962 X-DSPAM-Improbability: 1 in 26150 chance of being spam X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 1,4e964aab11841046321393 X-DSPAM-Factors: 27, References*com>, 0.00148, Url*freebsd, 0.00169, Url*freebsd, 0.00169, freebsd+org, 0.00170, freebsd+org, 0.00170, server+is, 0.00211, Breahna, 0.00242, Breahna, 0.00242, ?+>, 0.00264, freebsd, 0.00340, freebsd, 0.00340, To+John, 0.00348, UTC+2011, 0.00354, FreeBSD, 0.00358, FreeBSD, 0.00358, no+>, 0.00380, 17+PM, 0.00396, Ok+I, 0.00402, but+>, 0.00487, >+Cc, 0.00501, To+George, 0.00549, you+wrote, 0.00569, From+Rick, 0.00624, From+Rick, 0.00624, >+Can, 0.00643, 2011+8, 0.00644, George+Breahna, 0.00673 X-PolarisMail-Flags: x X-Mailman-Approved-At: Thu, 13 Oct 2011 15:54:10 +0000 Cc: freebsd-amd64@freebsd.org Subject: RE: amd64/161493: NFS v3 directory structure update slow X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Oct 2011 02:19:25 -0000 I just about to write back to mention this. I "fixed" the problem by running with -o but the old server still has = that problem but very rarely. The new server is affected 100% of the = time for some reason. If I'm not using NFSv4 is there a performance reason why I should still = use the new nfs server over the old one ? Thank you very much for your help guys. George -----Original Message----- From: Rick Macklem [mailto:rmacklem@uoguelph.ca]=20 Sent: Wednesday, October 12, 2011 10:17 PM To: George Breahna Cc: freebsd-amd64@freebsd.org; John Baldwin Subject: Re: amd64/161493: NFS v3 directory structure update slow George Breahna wrote: > Ok, I will try this. >=20 > I noticed you wrote another patch, available here, called the dotdot > patch. It modifies another file on top of the one mentioned in the > link you gave me. Is that unnecessary now ? >=20 > http://people.freebsd.org/~rmacklem/dotdot.patch >=20 The above patch also fixed the old server. See below for the patch for the old server. (You will be running the old server if you start both mountd and nfsd with a "-o" option. This happens if you have oldnfs_server_enable=3D"YES" in your /etc/rc.conf.) Since "nfsstat -s" shows non-zero counts, you are running the new/default server. ("nfsstat -o -s" reports stats for the old server, which should be all zeros if you are running the new/default = one.) In summary, I don't think you are running the old server and only need to patch the old server if you choose to run it, as jhb@ suggested to help with isolating the problem. (I would suggest you do that, if the patch for the new/regular server doesn't fix the problem.) > George >=20 > -----Original Message----- > From: Rick Macklem [mailto:rmacklem@uoguelph.ca] > Sent: Wednesday, October 12, 2011 8:25 PM > To: John Baldwin > Cc: George Breahna; freebsd-gnats-submit@freebsd.org; Rick Macklem; > freebsd-amd64@freebsd.org > Subject: Re: amd64/161493: NFS v3 directory structure update slow >=20 > John Baldwin wrote: > > On Tuesday, October 11, 2011 11:07:13 am George Breahna wrote: > > > > > > >Number: 161493 > > > >Category: amd64 > > > >Synopsis: NFS v3 directory structure update slow > > > >Confidential: no > > > >Severity: critical > > > >Priority: high > > > >Responsible: freebsd-amd64 > > > >State: open > > > >Quarter: > > > >Keywords: > > > >Date-Required: > > > >Class: sw-bug > > > >Submitter-Id: current-users > > > >Arrival-Date: Tue Oct 11 15:10:07 UTC 2011 > > > >Closed-Date: > > > >Last-Modified: > > > >Originator: George Breahna > > > >Release: 9.0 Beta 2 > > > >Organization: > > > >Environment: > > > FreeBSD store2 9.0-BETA2 FreeBSD 9.0-BETA2 #0: Sun Sep 18 22:02:45 > > > EDT 2011 > > pulsar@store2.emailarray.com:/usr/obj/usr/src/sys/PULSAR amd64 > > > >Description: > > > We used to run a NFS server on FreeBSD 6.2 but we built a new box > > > recently > > and installed 9.0 Beta 2 on it. The data was moved over as it serves > > as the > > back-end for a mail system. It runs NFS v3 over TCP only and all the > > NFS- > > related processes (rpcbind, mountd, lockd, etc ) run with the -h > > switch and > > bind to the local IP address. > > > > > > The NFS server exports the data to 7 NFS clients ranging from > > > FreeBSD 6.1 to > > 8.2, the majority being 8.2 The mount on the NFS clients is done > > simply with - > > o tcp,rsize=3D32768,wsize=3D32768 > > > > > > Usual file operations, such as accessing files, creating > > > directories, > > removing files, chmod, chown, etc work perfectly but we noticed > > there > > were > > issues in removing directories that contained data. We had a strange > > error: > > > > > > rm -rf nick/ > > > rm: fts_read: Input/output error > > > > > > Using 'truss' on rm revealed this: > > > > > > open("..",O_RDONLY,00) ERR#5 'Input/output error' > > > > > > After much testing and debugging we realized the problem is in the > > > NFS > > protocol. ( either server or client but we assume server since this > > used to > > work very well with FreeBSD 6.2 ). The problem appears to be that > > NFS > > does not > > show the '..' after modifying a directory structure. Take the > > following > > example executed on a FreeBSD 8.2 client accessing the NFS share > > from > > the > > 9.0B2 server: > > > > > > imap5# mkdir test1 > > > imap5# cd test1 > > > imap5# touch file1 > > > imap5# touch file2 > > > imap5# ls -la > > > ls: ..: Input/output error > > > total 4 > > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:55 . > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file1 > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file2 > > > > > > Notice the '..' is missing from the display. If we now try and > > > remove the > > directory 'test1' it will throw the "rm: fts_read: Input/output > > error" > > error. > > > > > > If we wait in between 1 minute and 5 minutes, '..' will eventually > > > appear by > > itself. During this whole time, '..' effectively exists on the NFS > > server but > > it's not displayed by any of the NFS clients. > > > > > > I can force the NFS client to show it faster by doing an ls -la > > > from > > > the > > parent level. For example: > > > > > > imap5# mkdir test1 > > > imap5# touch test1/file1 > > > imap5# touch test1/file2 > > > imap5# touch test1/file3 > > > imap5# ls -la test1 > > > total 8 > > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 . > > > drwx------ 10 vpopmail vchkpw 1024 Oct 11 10:59 .. > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1 > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2 > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3 > > > imap5# cd test1 > > > imap5# ls -la > > > total 8 > > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 . > > > drwx------ 10 vpopmail vchkpw 1024 Oct 11 10:59 .. > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1 > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2 > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3 > > > > > > but if we wait 5 seconds after that display and try again: > > > > > > ls -la > > > ls: ..: Input/output error > > > total 4 > > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:59 . > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file1 > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file2 > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:59 file3 > > > > > > Again, if we wait longer ( 1-5 minutes ), the '..' will properly > > > appear in > > there. > > > > > > There are no error messages on the console or other log files. > > > This > > > is > > reproducible 100% of the time with any FreeBSD client. Have tried > > unmounting/remounting several times without any effect. Also tried > > different > > rsize/wsize, no effect. I think there is some delay in updating the > > directory > > structure and it's causing this bug. > > > > > > Here's also some output from nfsstat on the server: > > > > > > > > > Server Info: > > > Getattr Setattr Lookup Readlink Read Write Create > > Remove > > > 114731225 20496896 254966151 133 11697392 19963641 0 > > 9228861 > > > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus > > Access > > > 4313471 1157651 39 1955 16511932 15479669 0 > > 116927742 > > > Mknod Fsstat Fsinfo PathConf Commit > > > 0 4748487 48 0 14921747 > > > Server Ret-Failed > > > 0 > > > Server Faults > > > 0 > > > Server Cache Stats: > > > Inprog Idem Non-idem Misses > > > 0 0 0 613368147 > > > Server Write Gathering: > > > WriteOps WriteRPC Opsaved > > > 19963641 19963641 0 > > > > > > >How-To-Repeat: > > > imap5# mkdir test1 > > > imap5# cd test1 > > > imap5# touch file1 > > > imap5# touch file2 > > > imap5# ls -la > > > ls: ..: Input/output error > > > total 4 > > > drwxr-xr-x 2 root vchkpw 512 Oct 11 10:55 . > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file1 > > > -rw-r--r-- 1 root vchkpw 0 Oct 11 10:55 file2 > > > >Fix: > > > > Can you try using the "old" NFS server as a test? > > > Please make sure you have the patch in r225356 in your server's > kernel sources (it went into head on Sep. 3, but I don't know if > your Sep. 11 build would have it?). It fixed a problem that would > cause lookup of ".." to fail intermittently, because a field in > struct nameidata added on Aug. 13 wasn't initialized. >=20 > You can find the one line patch here: > = http://svnweb.freebsd.org/base/head/sys/fs/nfsserver/nfs_nfsdport.c?r1=3D= 224911&r2=3D225356 Clarification, this is the patch for the new/default server. There is a similar patch for the old server. For the old server, the patch is: = http://svnweb.freebsd.org/base/head/sys/nfsserver/nfs_serv.c?r1=3D219028&= r2=3D225356 rick >=20 > Please let us know if you have this patch and, if not, apply it > and see if the problem goes away. >=20 > Thanks, rick