Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Aug 1998 17:50:01 -0700 (PDT)
From:      Mika Nystrom <mika@cs.caltech.edu>
To:        freebsd-bugs@FreeBSD.ORG
Subject:   Re: kern/7596: serious data integrity problem when reading WHILE writing NFSv3 client-end
Message-ID:  <199808140050.RAA11449@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/7596; it has been noted by GNATS.

From: Mika Nystrom <mika@cs.caltech.edu>
To: freebsd-gnats-submit@freebsd.org, mika@cs.caltech.edu
Cc: freebsd-bugs@freebsd.org, thepish@freebsd.org, dillon@best.net,
        freebsd-hackers@freebsd.org, freebsd-current@freebsd.org,
        rajit@dogmatix.cs.caltech.edu
Subject: Re: kern/7596: serious data integrity problem when reading WHILE writing NFSv3 client-end
Date: Thu, 13 Aug 1998 17:45:31 -0700

 All right,
    I have tracked down the cause of the nulls.  I wrote a little program
 to find the sizes of the nulled sections of the files I was writing:
 
 #include <stdio.h>
 main()
 {
   FILE *fp;
   int o,state,when;
   char c;
   state=1;
   o=0;
   when=0;
   fp=fopen("x","rb");
 
   /* read in bytes */
   
   printf("Reading non-zero...");
   while(fread(&c,sizeof(char),1,fp)) {
     if(state==1 && c==0) {
        state^=1; 
        printf("%#x bytes.\nGot zero at %#x, reading...",o-when,o);
        when=o;
     }
     if(state==0 && c!=0) {
        state^=1; 
        printf("%#x bytes.\nGot non-zero at %#x, reading...",o-when,o);
        when=o;
     }
     ++o;
   }
   printf("%#x bytes.\n",o-when);
 }
 
 and ran this on a file written with the following program:
 
 #include <math.h>
 #include <string.h>
 #include <stdlib.h>
 #include <stdio.h>
 main()
 {
    FILE *fp;
    int i=0;
    float d=3.01111;
    fp=fopen("x","w");
    fwrite(&d,sizeof(float),1,fp); 
    while (1) {
       int j;
       int stop;
       int howmany;
       howmany=random()%80+1;
       while(i++%howmany) { fwrite(&d,sizeof(float),1,fp); }
       
       /* delay a bit */
 #if 0
       stop=random()%20000;
 #endif
       stop=50000;
       if (random()%33) fflush(fp);
       for (j=0; j<stop; j++) { volatile double e; e=10.1; e=sin(e); }
    }
 }
 
 while reading repeatedly with the following:
 
 #include <string.h>
 #include <stdio.h>
 
 main()
 {
 
 #define LEN sizeof(float)
    FILE *fp;
    char data[LEN];
 
    int len;
    fp=fopen("x","r");
 
    while(
      fread(data,LEN,1,fp)
    );
 }
 
 As you can see from write.c, the file should contain NO NULLS.
 If you do this on a FreeBSD-current (or probably 2.2) system
 that is set up as an NFSv3 client (the server doesn't matter),
 you get nulls (or at least I do).
 
 I tracked it down to the following:
 
 sys/nfs/nfs_bio.c: line 1114 and following
 
                     if (uiop->uio_resid) {
                         /*
                          * If len > 0, there is a hole in the file and
                          * no writes after the hole have been pushed to
                          * the server yet.
                          * Just zero fill the rest of the valid area.
                          */      
 
                         diff = bp->b_bcount - uiop->uio_resid;
                         len = np->n_size - (((u_quad_t)bp->b_blkno) * DEV_BSIZE 
                                 + diff);
                         if (len > 0) {
                             len = min(len, uiop->uio_resid);
                             bzero((char *)bp->b_data + diff, len);
                             bp->b_validend = diff + len;
                         } else      
                             bp->b_validend = diff;
                     } else
                         bp->b_validend = bp->b_bcount;
 
 Note the bzero!  I inserted a statement in the kernel to print whenever
 it was called and with what "len" argument:
 
 Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0xe0 bytes in nfs_bio.c
 Aug 13 16:46:09 dogmatix last message repeated 2 times 
 Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x8 bytes in nfs_bio.c 
 Aug 13 16:46:09 dogmatix last message repeated 2 times
 Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x4 bytes in nfs_bio.c 
 Aug 13 16:46:09 dogmatix last message repeated 2 times
 Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x24 bytes in nfs_bio.c
 Aug 13 16:46:09 dogmatix last message repeated 3 times
 Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x14 bytes in nfs_bio.c
 Aug 13 16:46:09 dogmatix last message repeated 3 times
 Aug 13 16:46:09 dogmatix /kernel.nfshack: bzeroing 0x2c bytes in nfs_bio.c
 Aug 13 16:46:10 dogmatix last message repeated 8 times
 Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x64 bytes in nfs_bio.c
 Aug 13 16:46:11 dogmatix last message repeated 12 times
 Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x7c bytes in nfs_bio.c
 Aug 13 16:46:11 dogmatix last message repeated 12 times
 Aug 13 16:46:11 dogmatix /kernel.nfshack: bzeroing 0x44 bytes in nfs_bio.c
 Aug 13 16:46:11 dogmatix last message repeated 13 times
 Aug 13 16:46:12 dogmatix /kernel.nfshack: bzeroing 0x60 bytes in nfs_bio.c
 
 When I ran the null-counter on the file produced, it told me I had a 
 section of valid data, followed by 0xe0 nulls, followed by valid data,
 followed by 0x8 bytes of nulls, etc.  Clearly, the bzero is zeroing
 valid, uncommitted data.
 
 Hmm it appears this bug might occur with NFSv2 also, and not just v3 as
 I thought.  Comments?
 
 My fix consists of either:
 
 just getting rid of the bzero.  I am concerned that this might let 
 reads return garbage at the end of the buffer, though, so what I am
 doing on our systems is:
 
 getting rid of the whole if (len > 0) section
 
      36  *      @(#)nfs_bio.c   8.9 (Berkeley) 3/30/95
      37  * $Id: nfs_bio.c,v 1.54 1998/03/28 16:05:05 steve Exp $
 [...]
    1123 #if 0
    1124                         if (len > 0) {
    1125                             static char x[]="with nfsholes_hack";
    1126                             printf("bzeroing %#x bytes in nfs_bio.c\n",l
 en);
    1127                             len = min(len, uiop->uio_resid);
    1128                             bzero((char *)bp->b_data + diff, len);
    1129                             bp->b_validend = diff + len;
    1130                         } else
    1131 #endif              
    1132                             bp->b_validend = diff;
 
 In other words, if I understand the code correctly, the read is short.
 However, that really shouldn't matter since the file will be brought
 up-to-date eventually, and this is NFS, so that is good enough...
 
 Could someone who understands the purpose of the bzero and/or how this
 NFS stuff works comment?  (And commit a fix for this ASAP because this
 is a real serious data-threatening bug?)
 
 Thanks to David Holland (dholland@eecs.harvard.edu) who helped me
 debug this.
 
     Mika
     <mika@cs.caltech.edu>
 
 
 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808140050.RAA11449>