Date: Thu, 13 Aug 1998 17:50:01 -0700 (PDT) From: Mika Nystrom <mika@cs.caltech.edu> To: freebsd-bugs@FreeBSD.ORG Subject: Re: kern/7596: serious data integrity problem when reading WHILE writing NFSv3 client-end Message-ID: <199808140050.RAA11449@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/7596; it has been noted by GNATS.
From: Mika Nystrom <mika@cs.caltech.edu>
To: freebsd-gnats-submit@freebsd.org, mika@cs.caltech.edu
Cc: freebsd-bugs@freebsd.org, thepish@freebsd.org, dillon@best.net,
freebsd-hackers@freebsd.org, freebsd-current@freebsd.org,
rajit@dogmatix.cs.caltech.edu
Subject: Re: kern/7596: serious data integrity problem when reading WHILE writing NFSv3 client-end
Date: Thu, 13 Aug 1998 17:45:31 -0700
All right,
I have tracked down the cause of the nulls. I wrote a little program
to find the sizes of the nulled sections of the files I was writing:
#include <stdio.h>
main()
{
FILE *fp;
int o,state,when;
char c;
state=1;
o=0;
when=0;
fp=fopen("x","rb");
/* read in bytes */
printf("Reading non-zero...");
while(fread(&c,sizeof(char),1,fp)) {
if(state==1 && c==0) {
state^=1;
printf("%#x bytes.\nGot zero at %#x, reading...",o-when,o);
when=o;
}
if(state==0 && c!=0) {
state^=1;
printf("%#x bytes.\nGot non-zero at %#x, reading...",o-when,o);
when=o;
}
++o;
}
printf("%#x bytes.\n",o-when);
}
and ran this on a file written with the following program:
#include <math.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
main()
{
FILE *fp;
int i=0;
float d=3.01111;
fp=fopen("x","w");
fwrite(&d,sizeof(float),1,fp);
while (1) {
int j;
int stop;
int howmany;
howmany=random()%80+1;
while(i++%howmany) { fwrite(&d,sizeof(float),1,fp); }
/* delay a bit */
#if 0
stop=random()%20000;
#endif
stop=50000;
if (random()%33) fflush(fp);
for (j=0; j<stop; j++) { volatile double e; e=10.1; e=sin(e); }
}
}
while reading repeatedly with the following:
#include <string.h>
#include <stdio.h>
main()
{
#define LEN sizeof(float)
FILE *fp;
char data[LEN];
int len;
fp=fopen("x","r");
while(
fread(data,LEN,1,fp)
);
}
As you can see from write.c, the file should contain NO NULLS.
If you do this on a FreeBSD-current (or probably 2.2) system
that is set up as an NFSv3 client (the server doesn't matter),
you get nulls (or at least I do).
I tracked it down to the following:
sys/nfs/nfs_bio.c: line 1114 and following
if (uiop->uio_resid) {
/*
* If len > 0, there is a hole in the file and
* no writes after the hole have been pushed to
* the server yet.
* Just zero fill the rest of the valid area.
*/
diff = bp->b_bcount - uiop->uio_resid;
len = np->n_size - (((u_quad_t)bp->b_blkno) * DEV_BSIZE
+ diff);
if (len > 0) {
len = min(len, uiop->uio_resid);
bzero((char *)bp->b_data + diff, len);
bp->b_validend = diff + len;
} else
bp->b_validend = diff;
} else
bp->b_validend = bp->b_bcount;
Note the bzero! I inserted a statement in the kernel to print whenever
it was called and with what "len" argument:
Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0xe0 bytes in nfs_bio.c
Aug 13 16:46:09 dogmatix last message repeated 2 times
Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x8 bytes in nfs_bio.c
Aug 13 16:46:09 dogmatix last message repeated 2 times
Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x4 bytes in nfs_bio.c
Aug 13 16:46:09 dogmatix last message repeated 2 times
Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x24 bytes in nfs_bio.c
Aug 13 16:46:09 dogmatix last message repeated 3 times
Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x14 bytes in nfs_bio.c
Aug 13 16:46:09 dogmatix last message repeated 3 times
Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x2c bytes in nfs_bio.c
Aug 13 16:46:10 dogmatix last message repeated 8 times
Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x64 bytes in nfs_bio.c
Aug 13 16:46:11 dogmatix last message repeated 12 times
Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x7c bytes in nfs_bio.c
Aug 13 16:46:11 dogmatix last message repeated 12 times
Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x44 bytes in nfs_bio.c
Aug 13 16:46:11 dogmatix last message repeated 13 times
Aug 13 16:46:12 dogmatix /kernel.nfshack: bzeroing 0x60 bytes in nfs_bio.c
When I ran the null-counter on the file produced, it told me I had a
section of valid data, followed by 0xe0 nulls, followed by valid data,
followed by 0x8 bytes of nulls, etc. Clearly, the bzero is zeroing
valid, uncommitted data.
Hmm it appears this bug might occur with NFSv2 also, and not just v3 as
I thought. Comments?
My fix consists of either:
just getting rid of the bzero. I am concerned that this might let
reads return garbage at the end of the buffer, though, so what I am
doing on our systems is:
getting rid of the whole if (len > 0) section
36 * @(#)nfs_bio.c 8.9 (Berkeley) 3/30/95
37 * $Id: nfs_bio.c,v 1.54 1998/03/28 16:05:05 steve Exp $
[...]
1123 #if 0
1124 if (len > 0) {
1125 static char x[]="with nfsholes_hack";
1126 printf("bzeroing %#x bytes in nfs_bio.c\n",l
en);
1127 len = min(len, uiop->uio_resid);
1128 bzero((char *)bp->b_data + diff, len);
1129 bp->b_validend = diff + len;
1130 } else
1131 #endif
1132 bp->b_validend = diff;
In other words, if I understand the code correctly, the read is short.
However, that really shouldn't matter since the file will be brought
up-to-date eventually, and this is NFS, so that is good enough...
Could someone who understands the purpose of the bzero and/or how this
NFS stuff works comment? (And commit a fix for this ASAP because this
is a real serious data-threatening bug?)
Thanks to David Holland (dholland@eecs.harvard.edu) who helped me
debug this.
Mika
<mika@cs.caltech.edu>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808140050.RAA11449>
