Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 May 2001 14:41:09 +0200 (CEST)
From:      Jan Conrad <conrad@th.physik.uni-bonn.de>
To:        <freebsd-stable@freebsd.org>
Subject:   NFS performance w. softupdates and va_blocksize
Message-ID:  <20010511135932.W450-100000@merlin.th.physik.uni-bonn.de>

next in thread | raw e-mail | index | archive | help
Hi,

my message covers two somewhat related issues of NFS under FreeBSD

(1) Performance loss of 512byte writes over an NFS mount
    (with softupdates on the server filesystem!)
(2) va_blocksize set to 512 on NFSv3 mounts (client side)
    (see kern/27232)

since I we only have stable 4.x and 3.x boxes here I cannot verify (1) for
current so I decided to send this message to stable..

I discovered those issues because under some conditions point (2) leads to
point (1) for libc/stdio routines (see below)


(1) NFS and softupdates

When writing say 1 MB of data in 512byte chunks on an NFS client to an
NFS mounted file, the performance drops over a factor of 10 compared to
writing the data in, say 8192byte blocks.

What scares me is that this performance drop is due to disk operations
(on our older boxes (3.x) you can easily *hear* it).

In addition our servers have soft updates on! From what I know of
softupdates writing file data alone is async anyhow. (And our server is
empty right now - we're still testing - and fast!. And there are no
fsyncs, file closes etc....)

So where does that disk io come from?


Even more funny, you can trigger that behavior by the following little
programm

  if (((fd = open(file, O_RDWR|O_CREAT|O_APPEND , (mode_t) 0000644)) >= 0)
&&
      ((f = fdopen(fd, "a+")) != NULL)) {
    for (i=0;i<1000;i++) {
      fwrite (buf, (size_t) 1, (size_t) 16384, f);
    };

if you do a ktrace, it's just 512byte writes (see point (2) below)....

But if you leave away the O_APPEND the code does the following


\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
        \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
   510 fwrite   RET   write 512/0x200
   510 fwrite   CALL  lseek(0x3,0,0,0,0x2)
   510 fwrite   RET   lseek 193536/0x2f400
   510 fwrite   CALL  write(0x3,0xbfbfb890,0x200)
   510 fwrite   GIO   fd 3 wrote 512 bytes

"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\


and is much faster already. (I don't understand this, but maybe it's
trivial)


(2) va_blocksize = 512 on NFSv3 mounts (see kern/27232)


Ok - since my PR seems to have caused some confusion, lets collect the
facts first.

- On an NFSv3 mount stat on a regular file gives back st_blksize=512
  This is due to the assignement
	vap->va_blocksize = NFS_FABLKSIZE;
  (=512) in nfs_loadattrcache of sys/nfs/nfs_subs.c

- On UFS (newfs'd with -b 8192 -f 1024) stat gives st_blksize=8192
  This is due to the assignement
        vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize;
  in ufs_getattr in sys/ufs/ufs/ufs_vnops.c

- st_blksize is used by lib/stdio to determine the default buffer size for
  stream io.
  Under some conditions that triggers (1) above!


Ok, let's go to opinions now:

I would think that on NFSv3 mounts one should assign
	vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize;
as well.

However, as I am not a kernel hacker, I would like to ask you to whether
this might have any negative side effects? (as far as I can tell
va_blocksize isn't used in the kernel at all.. and what for userland
io?)

May I test it without blowing away my box?

Or maybe it's simply incorrect to do that?

If not, why not commit it? (Well - I know pine is stupid - unfortunately
its standard for physics institutes. But there are a lot of pine users
out there, they will appreciate it, immediately, I'll assure you ;-)


Anyway, I would appreciate your comments and opinions!

regards
-Jan


-- 
Physikalisches Institut der Universitaet Bonn
Nussallee 12
D-53115 Bonn
GERMANY




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010511135932.W450-100000>