Date: Fri, 12 Dec 1997 17:39:32 -0500 (EST) From: Bill Paul <wpaul@skynet.ctr.columbia.edu> To: current@freebsd.org, toor@dyson.iquest.net Subject: mmap() + NFS problems persist Message-ID: <199712122239.RAA08542@skynet.ctr.columbia.edu>
next in thread | raw e-mail | index | archive | help
Yes, I'm still here. I'm still seeing problems with FreeBSD-current, mmap() and NFS. I've upgraded to a 3.0 SNAP from Dec, 9th and the trouble is still there. Again, there are two possible failure modes: in the first case, the process becomes wedged and unkillable with ps -alx showing wait channel to be "vmopar", and in the second case, the whole system wedges because nfs_bioread() gets caught in an endless loop. I've been trying to investigate the latter problem since it's more of a show-stopper, but my RPC clue isn't enough to help me understand the inner workings of the VM system, which I think is partly where the problem lies (inasmuch as it relates to NFS anyway). Within nfs_bioread(), there is a large do {} while(); loop, inside which you have the following code: switch (vp->v_type) { case VREG: nfsstats.biocache_reads++; lbn = uio->uio_offset / biosize; on = uio->uio_offset & (biosize - 1); not_readin = 1; [...] /* * If the block is in the cache and has the required data * in a valid region, just copy it out. * Otherwise, get the block and write back/read in, * as required. */ again: bufsize = biosize; if ((off_t)(lbn + 1) * biosize > np->n_size && (off_t)(lbn + 1) * biosize - np->n_size < biosize) { bufsize = np->n_size - lbn * biosize; bufsize = (bufsize + DEV_BSIZE - 1) & ~(DEV_BSIZE - 1); } bp = nfs_getcacheblk(vp, lbn, bufsize, p); if (!bp) return (EINTR); /* * If we are being called from nfs_getpages, we must * make sure the buffer is a vmio buffer. The vp will * already be setup for vmio but there may be some old * non-vmio buffers attached to it. */ if (getpages && !(bp->b_flags & B_VMIO)) { #ifdef DIAGNOSTIC printf("nfs_bioread: non vmio buf found, discarding\n"); #endif bp->b_flags |= B_NOCACHE; bp->b_flags |= B_INVAFTERWRITE; if (bp->b_dirtyend > 0) { if ((bp->b_flags & B_DELWRI) == 0) panic("nfsbioread"); if (VOP_BWRITE(bp) == EINTR) return (EINTR); } else brelse(bp); goto again; } if ((bp->b_flags & B_CACHE) == 0) { bp->b_flags |= B_READ; bp->b_flags &= ~(B_DONE | B_ERROR | B_INVAL); not_readin = 0; vfs_busy_pages(bp, 0); error = nfs_doio(bp, cred, p); if (error) { brelse(bp); return (error); } } if (bufsize > on) { n = min((unsigned)(bufsize - on), uio->uio_resid); } else { n = 0; } diff = np->n_size - uio->uio_offset; if (diff < n) n = diff; if (not_readin && n > 0) { if (on < bp->b_validoff || (on + n) > bp->b_validend) { bp->b_flags |= B_NOCACHE; bp->b_flags |= B_INVAFTERWRITE; if (bp->b_dirtyend > 0) { if ((bp->b_flags & B_DELWRI) == 0) panic("nfsbioread"); if (VOP_BWRITE(bp) == EINTR) return (EINTR); } else brelse(bp); goto again; <----- LOOPS HERE!! } } vp->v_lastr = lbn; diff = (on >= bp->b_validend) ? 0 : (bp->b_validend - on); if (diff < n) n = diff; break; case VLNK: [...] The spot labeled 'LOOPS HERE!!' is there the infinite loop happens. The code calls nfs_getcacheblk() to return the block from the mmap()ed file that is being faulted in, but it is not happy with the block that it gets, so branches back around to do the 'again' label which causes nfs_getcacheblk() to be called again, but it returns the same block which it doesn't like, and the cycle repeats. The buffer that is returned has bp_validoff == 0 and bp_validend == 4096. Also, bufsize == 8192 and uio_offset == 4096. The value for uio_offset makes sense based on the behavior of my program: the page fault happens when the program first crosses the boundary into the second 4096-byte page. However, each time nfs_getcacheblk() is called, it returns the same buffer with bp_validoff == 0 and bp_validend == 4096. These numbers are not what the code expects (I suppose bp->validend would need to be 8192), so it releases the block and tries again. Why it never gets the right block I don't know. To help debug this (I hope) I've slapped together the source for the program I have that wedges my system you can get it from: ftp.ctr.columbia.edu:/pub/misc/freebsd/mmap_locktest.tar.gz skynet.ctr.columbia.edu:/pub/freebsd/mmap_locktest.tar.gz freebsd.org:/home/wpaul/mmap_locktest.tar.gz This should compile standalone (i.e. without any other NIS+ cruft). Please excuse all the NIS+ headers. To reproduce the bug, do the following: - Configure a FreeBSD 3.0 host as an NFS client - Unpack the source code onto an NFS filesystem and type 'make.' This will build a program called (stupidly enough) 'foo.' - Run 'foo' several times. When you run it, you will see things like this: [/proj/mbone/nis/usr.sbin/nis_cachemgr/mmap_test]:mbone{217}% ./foo FSIZE: 8192 data SIZE 1044 truncating... mmaping... copying... unmapping ver: 2 FSIZE: 8192 data SIZE 1132 truncating... mmaping... copying... unmapping The first time you run 'foo' it will create a file in the current directory called 'test.' The program attempts to read and write data into this file via mmap(). Each time you run the program, 'SIZE' will increase. SIZE indicates the amount of data written into the mmap()ed region. After you run 'foo' enough times, 'SIZE' will approach 4096 bytes. Once SIZE gets to be just under 4096 bytes, run foo one more time, and the system will hang. At least, it does for me. Note that you have to run the program a few dozen times in succession to get it up to 4096 bytes. Again, what seems to happen is that the crossing into the next 4K page causes a page fault because the second 4K region isn't in core. This causes vm_fault() to eventually call into nfs_getpages(), which calls into nfs_bioread(), which gets all tied up in knots. Hopefully somebody besides me can duplicate this. Hey, wait: ampere runs 3.0-current... Uh-oh. I'm in trouble. Uhmm... could somebody reboot ampere? :( -Bill -- ============================================================================= -Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu Work: wpaul@ctr.columbia.edu | Center for Telecommunications Research Home: wpaul@skynet.ctr.columbia.edu | Columbia University, New York City ============================================================================= "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" =============================================================================
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199712122239.RAA08542>