Date: Fri, 11 May 2001 14:41:09 +0200 (CEST) From: Jan Conrad <conrad@th.physik.uni-bonn.de> To: <freebsd-stable@freebsd.org> Subject: NFS performance w. softupdates and va_blocksize Message-ID: <20010511135932.W450-100000@merlin.th.physik.uni-bonn.de>
next in thread | raw e-mail | index | archive | help
Hi,
my message covers two somewhat related issues of NFS under FreeBSD
(1) Performance loss of 512byte writes over an NFS mount
(with softupdates on the server filesystem!)
(2) va_blocksize set to 512 on NFSv3 mounts (client side)
(see kern/27232)
since I we only have stable 4.x and 3.x boxes here I cannot verify (1) for
current so I decided to send this message to stable..
I discovered those issues because under some conditions point (2) leads to
point (1) for libc/stdio routines (see below)
(1) NFS and softupdates
When writing say 1 MB of data in 512byte chunks on an NFS client to an
NFS mounted file, the performance drops over a factor of 10 compared to
writing the data in, say 8192byte blocks.
What scares me is that this performance drop is due to disk operations
(on our older boxes (3.x) you can easily *hear* it).
In addition our servers have soft updates on! From what I know of
softupdates writing file data alone is async anyhow. (And our server is
empty right now - we're still testing - and fast!. And there are no
fsyncs, file closes etc....)
So where does that disk io come from?
Even more funny, you can trigger that behavior by the following little
programm
if (((fd = open(file, O_RDWR|O_CREAT|O_APPEND , (mode_t) 0000644)) >= 0)
&&
((f = fdopen(fd, "a+")) != NULL)) {
for (i=0;i<1000;i++) {
fwrite (buf, (size_t) 1, (size_t) 16384, f);
};
if you do a ktrace, it's just 512byte writes (see point (2) below)....
But if you leave away the O_APPEND the code does the following
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
510 fwrite RET write 512/0x200
510 fwrite CALL lseek(0x3,0,0,0,0x2)
510 fwrite RET lseek 193536/0x2f400
510 fwrite CALL write(0x3,0xbfbfb890,0x200)
510 fwrite GIO fd 3 wrote 512 bytes
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
and is much faster already. (I don't understand this, but maybe it's
trivial)
(2) va_blocksize = 512 on NFSv3 mounts (see kern/27232)
Ok - since my PR seems to have caused some confusion, lets collect the
facts first.
- On an NFSv3 mount stat on a regular file gives back st_blksize=512
This is due to the assignement
vap->va_blocksize = NFS_FABLKSIZE;
(=512) in nfs_loadattrcache of sys/nfs/nfs_subs.c
- On UFS (newfs'd with -b 8192 -f 1024) stat gives st_blksize=8192
This is due to the assignement
vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize;
in ufs_getattr in sys/ufs/ufs/ufs_vnops.c
- st_blksize is used by lib/stdio to determine the default buffer size for
stream io.
Under some conditions that triggers (1) above!
Ok, let's go to opinions now:
I would think that on NFSv3 mounts one should assign
vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize;
as well.
However, as I am not a kernel hacker, I would like to ask you to whether
this might have any negative side effects? (as far as I can tell
va_blocksize isn't used in the kernel at all.. and what for userland
io?)
May I test it without blowing away my box?
Or maybe it's simply incorrect to do that?
If not, why not commit it? (Well - I know pine is stupid - unfortunately
its standard for physics institutes. But there are a lot of pine users
out there, they will appreciate it, immediately, I'll assure you ;-)
Anyway, I would appreciate your comments and opinions!
regards
-Jan
--
Physikalisches Institut der Universitaet Bonn
Nussallee 12
D-53115 Bonn
GERMANY
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010511135932.W450-100000>
