Date: Fri, 11 May 2001 14:41:09 +0200 (CEST) From: Jan Conrad <conrad@th.physik.uni-bonn.de> To: <freebsd-stable@freebsd.org> Subject: NFS performance w. softupdates and va_blocksize Message-ID: <20010511135932.W450-100000@merlin.th.physik.uni-bonn.de>
next in thread | raw e-mail | index | archive | help
Hi, my message covers two somewhat related issues of NFS under FreeBSD (1) Performance loss of 512byte writes over an NFS mount (with softupdates on the server filesystem!) (2) va_blocksize set to 512 on NFSv3 mounts (client side) (see kern/27232) since I we only have stable 4.x and 3.x boxes here I cannot verify (1) for current so I decided to send this message to stable.. I discovered those issues because under some conditions point (2) leads to point (1) for libc/stdio routines (see below) (1) NFS and softupdates When writing say 1 MB of data in 512byte chunks on an NFS client to an NFS mounted file, the performance drops over a factor of 10 compared to writing the data in, say 8192byte blocks. What scares me is that this performance drop is due to disk operations (on our older boxes (3.x) you can easily *hear* it). In addition our servers have soft updates on! From what I know of softupdates writing file data alone is async anyhow. (And our server is empty right now - we're still testing - and fast!. And there are no fsyncs, file closes etc....) So where does that disk io come from? Even more funny, you can trigger that behavior by the following little programm if (((fd = open(file, O_RDWR|O_CREAT|O_APPEND , (mode_t) 0000644)) >= 0) && ((f = fdopen(fd, "a+")) != NULL)) { for (i=0;i<1000;i++) { fwrite (buf, (size_t) 1, (size_t) 16384, f); }; if you do a ktrace, it's just 512byte writes (see point (2) below).... But if you leave away the O_APPEND the code does the following \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0" 510 fwrite RET write 512/0x200 510 fwrite CALL lseek(0x3,0,0,0,0x2) 510 fwrite RET lseek 193536/0x2f400 510 fwrite CALL write(0x3,0xbfbfb890,0x200) 510 fwrite GIO fd 3 wrote 512 bytes "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\ and is much faster already. (I don't understand this, but maybe it's trivial) (2) va_blocksize = 512 on NFSv3 mounts (see kern/27232) Ok - since my PR seems to have caused some confusion, lets collect the facts first. - On an NFSv3 mount stat on a regular file gives back st_blksize=512 This is due to the assignement vap->va_blocksize = NFS_FABLKSIZE; (=512) in nfs_loadattrcache of sys/nfs/nfs_subs.c - On UFS (newfs'd with -b 8192 -f 1024) stat gives st_blksize=8192 This is due to the assignement vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize; in ufs_getattr in sys/ufs/ufs/ufs_vnops.c - st_blksize is used by lib/stdio to determine the default buffer size for stream io. Under some conditions that triggers (1) above! Ok, let's go to opinions now: I would think that on NFSv3 mounts one should assign vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize; as well. However, as I am not a kernel hacker, I would like to ask you to whether this might have any negative side effects? (as far as I can tell va_blocksize isn't used in the kernel at all.. and what for userland io?) May I test it without blowing away my box? Or maybe it's simply incorrect to do that? If not, why not commit it? (Well - I know pine is stupid - unfortunately its standard for physics institutes. But there are a lot of pine users out there, they will appreciate it, immediately, I'll assure you ;-) Anyway, I would appreciate your comments and opinions! regards -Jan -- Physikalisches Institut der Universitaet Bonn Nussallee 12 D-53115 Bonn GERMANY To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010511135932.W450-100000>