Date: Thu, 13 Aug 1998 17:50:01 -0700 (PDT) From: Mika Nystrom <mika@cs.caltech.edu> To: freebsd-bugs@FreeBSD.ORG Subject: Re: kern/7596: serious data integrity problem when reading WHILE writing NFSv3 client-end Message-ID: <199808140050.RAA11449@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/7596; it has been noted by GNATS. From: Mika Nystrom <mika@cs.caltech.edu> To: freebsd-gnats-submit@freebsd.org, mika@cs.caltech.edu Cc: freebsd-bugs@freebsd.org, thepish@freebsd.org, dillon@best.net, freebsd-hackers@freebsd.org, freebsd-current@freebsd.org, rajit@dogmatix.cs.caltech.edu Subject: Re: kern/7596: serious data integrity problem when reading WHILE writing NFSv3 client-end Date: Thu, 13 Aug 1998 17:45:31 -0700 All right, I have tracked down the cause of the nulls. I wrote a little program to find the sizes of the nulled sections of the files I was writing: #include <stdio.h> main() { FILE *fp; int o,state,when; char c; state=1; o=0; when=0; fp=fopen("x","rb"); /* read in bytes */ printf("Reading non-zero..."); while(fread(&c,sizeof(char),1,fp)) { if(state==1 && c==0) { state^=1; printf("%#x bytes.\nGot zero at %#x, reading...",o-when,o); when=o; } if(state==0 && c!=0) { state^=1; printf("%#x bytes.\nGot non-zero at %#x, reading...",o-when,o); when=o; } ++o; } printf("%#x bytes.\n",o-when); } and ran this on a file written with the following program: #include <math.h> #include <string.h> #include <stdlib.h> #include <stdio.h> main() { FILE *fp; int i=0; float d=3.01111; fp=fopen("x","w"); fwrite(&d,sizeof(float),1,fp); while (1) { int j; int stop; int howmany; howmany=random()%80+1; while(i++%howmany) { fwrite(&d,sizeof(float),1,fp); } /* delay a bit */ #if 0 stop=random()%20000; #endif stop=50000; if (random()%33) fflush(fp); for (j=0; j<stop; j++) { volatile double e; e=10.1; e=sin(e); } } } while reading repeatedly with the following: #include <string.h> #include <stdio.h> main() { #define LEN sizeof(float) FILE *fp; char data[LEN]; int len; fp=fopen("x","r"); while( fread(data,LEN,1,fp) ); } As you can see from write.c, the file should contain NO NULLS. If you do this on a FreeBSD-current (or probably 2.2) system that is set up as an NFSv3 client (the server doesn't matter), you get nulls (or at least I do). I tracked it down to the following: sys/nfs/nfs_bio.c: line 1114 and following if (uiop->uio_resid) { /* * If len > 0, there is a hole in the file and * no writes after the hole have been pushed to * the server yet. * Just zero fill the rest of the valid area. */ diff = bp->b_bcount - uiop->uio_resid; len = np->n_size - (((u_quad_t)bp->b_blkno) * DEV_BSIZE + diff); if (len > 0) { len = min(len, uiop->uio_resid); bzero((char *)bp->b_data + diff, len); bp->b_validend = diff + len; } else bp->b_validend = diff; } else bp->b_validend = bp->b_bcount; Note the bzero! I inserted a statement in the kernel to print whenever it was called and with what "len" argument: Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0xe0 bytes in nfs_bio.c Aug 13 16:46:09 dogmatix last message repeated 2 times Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x8 bytes in nfs_bio.c Aug 13 16:46:09 dogmatix last message repeated 2 times Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x4 bytes in nfs_bio.c Aug 13 16:46:09 dogmatix last message repeated 2 times Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x24 bytes in nfs_bio.c Aug 13 16:46:09 dogmatix last message repeated 3 times Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x14 bytes in nfs_bio.c Aug 13 16:46:09 dogmatix last message repeated 3 times Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x2c bytes in nfs_bio.c Aug 13 16:46:10 dogmatix last message repeated 8 times Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x64 bytes in nfs_bio.c Aug 13 16:46:11 dogmatix last message repeated 12 times Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x7c bytes in nfs_bio.c Aug 13 16:46:11 dogmatix last message repeated 12 times Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x44 bytes in nfs_bio.c Aug 13 16:46:11 dogmatix last message repeated 13 times Aug 13 16:46:12 dogmatix /kernel.nfshack: bzeroing 0x60 bytes in nfs_bio.c When I ran the null-counter on the file produced, it told me I had a section of valid data, followed by 0xe0 nulls, followed by valid data, followed by 0x8 bytes of nulls, etc. Clearly, the bzero is zeroing valid, uncommitted data. Hmm it appears this bug might occur with NFSv2 also, and not just v3 as I thought. Comments? My fix consists of either: just getting rid of the bzero. I am concerned that this might let reads return garbage at the end of the buffer, though, so what I am doing on our systems is: getting rid of the whole if (len > 0) section 36 * @(#)nfs_bio.c 8.9 (Berkeley) 3/30/95 37 * $Id: nfs_bio.c,v 1.54 1998/03/28 16:05:05 steve Exp $ [...] 1123 #if 0 1124 if (len > 0) { 1125 static char x[]="with nfsholes_hack"; 1126 printf("bzeroing %#x bytes in nfs_bio.c\n",l en); 1127 len = min(len, uiop->uio_resid); 1128 bzero((char *)bp->b_data + diff, len); 1129 bp->b_validend = diff + len; 1130 } else 1131 #endif 1132 bp->b_validend = diff; In other words, if I understand the code correctly, the read is short. However, that really shouldn't matter since the file will be brought up-to-date eventually, and this is NFS, so that is good enough... Could someone who understands the purpose of the bzero and/or how this NFS stuff works comment? (And commit a fix for this ASAP because this is a real serious data-threatening bug?) Thanks to David Holland (dholland@eecs.harvard.edu) who helped me debug this. Mika <mika@cs.caltech.edu> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808140050.RAA11449>