From owner-freebsd-bugs Thu Aug 13 17:50:08 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id RAA20607 for freebsd-bugs-outgoing; Thu, 13 Aug 1998 17:50:08 -0700 (PDT) (envelope-from owner-freebsd-bugs@FreeBSD.ORG) Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id RAA20602 for ; Thu, 13 Aug 1998 17:50:03 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.8.8/8.8.5) id RAA11449; Thu, 13 Aug 1998 17:50:01 -0700 (PDT) Date: Thu, 13 Aug 1998 17:50:01 -0700 (PDT) Message-Id: <199808140050.RAA11449@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.ORG From: Mika Nystrom Subject: Re: kern/7596: serious data integrity problem when reading WHILE writing NFSv3 client-end Reply-To: Mika Nystrom Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org The following reply was made to PR kern/7596; it has been noted by GNATS. From: Mika Nystrom To: freebsd-gnats-submit@freebsd.org, mika@cs.caltech.edu Cc: freebsd-bugs@freebsd.org, thepish@freebsd.org, dillon@best.net, freebsd-hackers@freebsd.org, freebsd-current@freebsd.org, rajit@dogmatix.cs.caltech.edu Subject: Re: kern/7596: serious data integrity problem when reading WHILE writing NFSv3 client-end Date: Thu, 13 Aug 1998 17:45:31 -0700 All right, I have tracked down the cause of the nulls. I wrote a little program to find the sizes of the nulled sections of the files I was writing: #include main() { FILE *fp; int o,state,when; char c; state=1; o=0; when=0; fp=fopen("x","rb"); /* read in bytes */ printf("Reading non-zero..."); while(fread(&c,sizeof(char),1,fp)) { if(state==1 && c==0) { state^=1; printf("%#x bytes.\nGot zero at %#x, reading...",o-when,o); when=o; } if(state==0 && c!=0) { state^=1; printf("%#x bytes.\nGot non-zero at %#x, reading...",o-when,o); when=o; } ++o; } printf("%#x bytes.\n",o-when); } and ran this on a file written with the following program: #include #include #include #include main() { FILE *fp; int i=0; float d=3.01111; fp=fopen("x","w"); fwrite(&d,sizeof(float),1,fp); while (1) { int j; int stop; int howmany; howmany=random()%80+1; while(i++%howmany) { fwrite(&d,sizeof(float),1,fp); } /* delay a bit */ #if 0 stop=random()%20000; #endif stop=50000; if (random()%33) fflush(fp); for (j=0; j #include main() { #define LEN sizeof(float) FILE *fp; char data[LEN]; int len; fp=fopen("x","r"); while( fread(data,LEN,1,fp) ); } As you can see from write.c, the file should contain NO NULLS. If you do this on a FreeBSD-current (or probably 2.2) system that is set up as an NFSv3 client (the server doesn't matter), you get nulls (or at least I do). I tracked it down to the following: sys/nfs/nfs_bio.c: line 1114 and following if (uiop->uio_resid) { /* * If len > 0, there is a hole in the file and * no writes after the hole have been pushed to * the server yet. * Just zero fill the rest of the valid area. */ diff = bp->b_bcount - uiop->uio_resid; len = np->n_size - (((u_quad_t)bp->b_blkno) * DEV_BSIZE + diff); if (len > 0) { len = min(len, uiop->uio_resid); bzero((char *)bp->b_data + diff, len); bp->b_validend = diff + len; } else bp->b_validend = diff; } else bp->b_validend = bp->b_bcount; Note the bzero! I inserted a statement in the kernel to print whenever it was called and with what "len" argument: Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0xe0 bytes in nfs_bio.c Aug 13 16:46:09 dogmatix last message repeated 2 times Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x8 bytes in nfs_bio.c Aug 13 16:46:09 dogmatix last message repeated 2 times Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x4 bytes in nfs_bio.c Aug 13 16:46:09 dogmatix last message repeated 2 times Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x24 bytes in nfs_bio.c Aug 13 16:46:09 dogmatix last message repeated 3 times Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x14 bytes in nfs_bio.c Aug 13 16:46:09 dogmatix last message repeated 3 times Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x2c bytes in nfs_bio.c Aug 13 16:46:10 dogmatix last message repeated 8 times Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x64 bytes in nfs_bio.c Aug 13 16:46:11 dogmatix last message repeated 12 times Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x7c bytes in nfs_bio.c Aug 13 16:46:11 dogmatix last message repeated 12 times Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x44 bytes in nfs_bio.c Aug 13 16:46:11 dogmatix last message repeated 13 times Aug 13 16:46:12 dogmatix /kernel.nfshack: bzeroing 0x60 bytes in nfs_bio.c When I ran the null-counter on the file produced, it told me I had a section of valid data, followed by 0xe0 nulls, followed by valid data, followed by 0x8 bytes of nulls, etc. Clearly, the bzero is zeroing valid, uncommitted data. Hmm it appears this bug might occur with NFSv2 also, and not just v3 as I thought. Comments? My fix consists of either: just getting rid of the bzero. I am concerned that this might let reads return garbage at the end of the buffer, though, so what I am doing on our systems is: getting rid of the whole if (len > 0) section 36 * @(#)nfs_bio.c 8.9 (Berkeley) 3/30/95 37 * $Id: nfs_bio.c,v 1.54 1998/03/28 16:05:05 steve Exp $ [...] 1123 #if 0 1124 if (len > 0) { 1125 static char x[]="with nfsholes_hack"; 1126 printf("bzeroing %#x bytes in nfs_bio.c\n",l en); 1127 len = min(len, uiop->uio_resid); 1128 bzero((char *)bp->b_data + diff, len); 1129 bp->b_validend = diff + len; 1130 } else 1131 #endif 1132 bp->b_validend = diff; In other words, if I understand the code correctly, the read is short. However, that really shouldn't matter since the file will be brought up-to-date eventually, and this is NFS, so that is good enough... Could someone who understands the purpose of the bzero and/or how this NFS stuff works comment? (And commit a fix for this ASAP because this is a real serious data-threatening bug?) Thanks to David Holland (dholland@eecs.harvard.edu) who helped me debug this. Mika To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message