From owner-freebsd-stable Fri May 11 5:41:16 2001 Delivered-To: freebsd-stable@freebsd.org Received: from hamilton.th.physik.uni-bonn.de (hamilton.th.physik.uni-bonn.de [131.220.162.85]) by hub.freebsd.org (Postfix) with ESMTP id D90F737B422 for ; Fri, 11 May 2001 05:41:11 -0700 (PDT) (envelope-from conrad@th.physik.uni-bonn.de) Received: from merlin.th.physik.uni-bonn.de (merlin.th.physik.uni-bonn.de [131.220.161.121]) by hamilton.th.physik.uni-bonn.de (Postfix) with ESMTP id A8F165D0E for ; Fri, 11 May 2001 14:41:09 +0200 (CEST) Received: by merlin.th.physik.uni-bonn.de (Postfix, from userid 145) id 4C69B3640E; Fri, 11 May 2001 14:41:09 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by merlin.th.physik.uni-bonn.de (Postfix) with ESMTP id 35F2A32605 for ; Fri, 11 May 2001 14:41:09 +0200 (CEST) Date: Fri, 11 May 2001 14:41:09 +0200 (CEST) From: Jan Conrad To: Subject: NFS performance w. softupdates and va_blocksize Message-ID: <20010511135932.W450-100000@merlin.th.physik.uni-bonn.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi, my message covers two somewhat related issues of NFS under FreeBSD (1) Performance loss of 512byte writes over an NFS mount (with softupdates on the server filesystem!) (2) va_blocksize set to 512 on NFSv3 mounts (client side) (see kern/27232) since I we only have stable 4.x and 3.x boxes here I cannot verify (1) for current so I decided to send this message to stable.. I discovered those issues because under some conditions point (2) leads to point (1) for libc/stdio routines (see below) (1) NFS and softupdates When writing say 1 MB of data in 512byte chunks on an NFS client to an NFS mounted file, the performance drops over a factor of 10 compared to writing the data in, say 8192byte blocks. What scares me is that this performance drop is due to disk operations (on our older boxes (3.x) you can easily *hear* it). In addition our servers have soft updates on! From what I know of softupdates writing file data alone is async anyhow. (And our server is empty right now - we're still testing - and fast!. And there are no fsyncs, file closes etc....) So where does that disk io come from? Even more funny, you can trigger that behavior by the following little programm if (((fd = open(file, O_RDWR|O_CREAT|O_APPEND , (mode_t) 0000644)) >= 0) && ((f = fdopen(fd, "a+")) != NULL)) { for (i=0;i<1000;i++) { fwrite (buf, (size_t) 1, (size_t) 16384, f); }; if you do a ktrace, it's just 512byte writes (see point (2) below).... But if you leave away the O_APPEND the code does the following \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0" 510 fwrite RET write 512/0x200 510 fwrite CALL lseek(0x3,0,0,0,0x2) 510 fwrite RET lseek 193536/0x2f400 510 fwrite CALL write(0x3,0xbfbfb890,0x200) 510 fwrite GIO fd 3 wrote 512 bytes "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ and is much faster already. (I don't understand this, but maybe it's trivial) (2) va_blocksize = 512 on NFSv3 mounts (see kern/27232) Ok - since my PR seems to have caused some confusion, lets collect the facts first. - On an NFSv3 mount stat on a regular file gives back st_blksize=512 This is due to the assignement vap->va_blocksize = NFS_FABLKSIZE; (=512) in nfs_loadattrcache of sys/nfs/nfs_subs.c - On UFS (newfs'd with -b 8192 -f 1024) stat gives st_blksize=8192 This is due to the assignement vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize; in ufs_getattr in sys/ufs/ufs/ufs_vnops.c - st_blksize is used by lib/stdio to determine the default buffer size for stream io. Under some conditions that triggers (1) above! Ok, let's go to opinions now: I would think that on NFSv3 mounts one should assign vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize; as well. However, as I am not a kernel hacker, I would like to ask you to whether this might have any negative side effects? (as far as I can tell va_blocksize isn't used in the kernel at all.. and what for userland io?) May I test it without blowing away my box? Or maybe it's simply incorrect to do that? If not, why not commit it? (Well - I know pine is stupid - unfortunately its standard for physics institutes. But there are a lot of pine users out there, they will appreciate it, immediately, I'll assure you ;-) Anyway, I would appreciate your comments and opinions! regards -Jan -- Physikalisches Institut der Universitaet Bonn Nussallee 12 D-53115 Bonn GERMANY To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message