Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Dec 1997 17:39:32 -0500 (EST)
From:      Bill Paul <wpaul@skynet.ctr.columbia.edu>
To:        current@freebsd.org, toor@dyson.iquest.net
Subject:   mmap() + NFS problems persist
Message-ID:  <199712122239.RAA08542@skynet.ctr.columbia.edu>

next in thread | raw e-mail | index | archive | help
Yes, I'm still here.

I'm still seeing problems with FreeBSD-current, mmap() and NFS. I've
upgraded to a 3.0 SNAP from Dec, 9th and the trouble is still there.
Again, there are two possible failure modes: in the first case, the
process becomes wedged and unkillable with ps -alx showing wait channel
to be "vmopar", and in the second case, the whole system wedges because
nfs_bioread() gets caught in an endless loop.

I've been trying to investigate the latter problem since it's more of
a show-stopper, but my RPC clue isn't enough to help me understand the
inner workings of the VM system, which I think is partly where the
problem lies (inasmuch as it relates to NFS anyway).

Within nfs_bioread(), there is a large do {} while(); loop, inside which
you have the following code:

            switch (vp->v_type) {
            case VREG:
                nfsstats.biocache_reads++;
                lbn = uio->uio_offset / biosize;
                on = uio->uio_offset & (biosize - 1);
                not_readin = 1;
[...]
                /*
                 * If the block is in the cache and has the required data
                 * in a valid region, just copy it out.
                 * Otherwise, get the block and write back/read in,
                 * as required.
                 */
again:
                bufsize = biosize;
                if ((off_t)(lbn + 1) * biosize > np->n_size && 
                    (off_t)(lbn + 1) * biosize - np->n_size < biosize) {
                        bufsize = np->n_size - lbn * biosize;
                        bufsize = (bufsize + DEV_BSIZE - 1) & ~(DEV_BSIZE - 1);
                }
                bp = nfs_getcacheblk(vp, lbn, bufsize, p);
                if (!bp)
                        return (EINTR);
                /*
                 * If we are being called from nfs_getpages, we must
                 * make sure the buffer is a vmio buffer.  The vp will
                 * already be setup for vmio but there may be some old
                 * non-vmio buffers attached to it.
                 */
                if (getpages && !(bp->b_flags & B_VMIO)) {
#ifdef DIAGNOSTIC
                        printf("nfs_bioread: non vmio buf found, discarding\n");
#endif
                        bp->b_flags |= B_NOCACHE;
                        bp->b_flags |= B_INVAFTERWRITE;
                        if (bp->b_dirtyend > 0) {
                                if ((bp->b_flags & B_DELWRI) == 0)
                                        panic("nfsbioread");
                                if (VOP_BWRITE(bp) == EINTR)
                                        return (EINTR);
                        } else
                                brelse(bp);
                        goto again;
                }
                if ((bp->b_flags & B_CACHE) == 0) {
                        bp->b_flags |= B_READ;
                        bp->b_flags &= ~(B_DONE | B_ERROR | B_INVAL);
                        not_readin = 0;
                        vfs_busy_pages(bp, 0);
                        error = nfs_doio(bp, cred, p);
                        if (error) {
                            brelse(bp);
                            return (error);
                        }
                }
                if (bufsize > on) {
                        n = min((unsigned)(bufsize - on), uio->uio_resid);
                } else {
                        n = 0;
                }
                diff = np->n_size - uio->uio_offset;
                if (diff < n)
                        n = diff;
                if (not_readin && n > 0) {
                        if (on < bp->b_validoff || (on + n) > 
bp->b_validend) {
                                bp->b_flags |= B_NOCACHE;
                                bp->b_flags |= B_INVAFTERWRITE;
                                if (bp->b_dirtyend > 0) {
                                    if ((bp->b_flags & B_DELWRI) == 0)
                                        panic("nfsbioread");
                                    if (VOP_BWRITE(bp) == EINTR)
                                        return (EINTR);
                                } else
                                    brelse(bp);
                                goto again;  <----- LOOPS HERE!!
                        }
                }
                vp->v_lastr = lbn;
                diff = (on >= bp->b_validend) ? 0 : (bp->b_validend - on);
                if (diff < n)
                        n = diff;
                break;
            case VLNK:
[...]

The spot labeled 'LOOPS HERE!!' is there the infinite loop happens.
The code calls nfs_getcacheblk() to return the block from the mmap()ed
file that is being faulted in, but it is not happy with the block that
it gets, so branches back around to do the 'again' label which causes
nfs_getcacheblk() to be called again, but it returns the same block
which it doesn't like, and the cycle repeats. The buffer that is
returned has bp_validoff == 0 and bp_validend == 4096. Also,
bufsize == 8192 and uio_offset == 4096. The value for uio_offset makes
sense based on the behavior of my program: the page fault happens when
the program first crosses the boundary into the second 4096-byte page.
However, each time nfs_getcacheblk() is called, it returns the same
buffer with bp_validoff == 0 and bp_validend == 4096. These numbers
are not what the code expects (I suppose bp->validend would need to be
8192), so it releases the block and tries again.

Why it never gets the right block I don't know.

To help debug this (I hope) I've slapped together the source for the
program I have that wedges my system you can get it from:

ftp.ctr.columbia.edu:/pub/misc/freebsd/mmap_locktest.tar.gz
skynet.ctr.columbia.edu:/pub/freebsd/mmap_locktest.tar.gz
freebsd.org:/home/wpaul/mmap_locktest.tar.gz

This should compile standalone (i.e. without any other NIS+ cruft).
Please excuse all the NIS+ headers.

To reproduce the bug, do the following:

- Configure a FreeBSD 3.0 host as an NFS client
- Unpack the source code onto an NFS filesystem and type 'make.'
  This will build a program called (stupidly enough) 'foo.'
- Run 'foo' several times. When you run it, you will see things
  like this:

  [/proj/mbone/nis/usr.sbin/nis_cachemgr/mmap_test]:mbone{217}% ./foo
  FSIZE: 8192 data SIZE 1044
  truncating...
  mmaping...
  copying...
  unmapping
  ver: 2
  FSIZE: 8192 data SIZE 1132
  truncating...
  mmaping...
  copying...
  unmapping

  The first time you run 'foo' it will create a file in the current
  directory called 'test.' The program attempts to read and write
  data into this file via mmap(). Each time you run the program, 'SIZE' 
  will increase. SIZE indicates the amount of data written into the 
  mmap()ed region. After you run 'foo' enough times, 'SIZE' will approach
  4096 bytes. Once SIZE gets to be just under 4096 bytes, run foo one
  more time, and the system will hang. At least, it does for me.
  Note that you have to run the program a few dozen times in succession
  to get it up to 4096 bytes.

Again, what seems to happen is that the crossing into the next 4K page
causes a page fault because the second 4K region isn't in core. This
causes vm_fault() to eventually call into nfs_getpages(), which calls
into nfs_bioread(), which gets all tied up in knots.

Hopefully somebody besides me can duplicate this. Hey, wait: ampere
runs 3.0-current...

Uh-oh. I'm in trouble.

Uhmm... could somebody reboot ampere? :(

-Bill

-- 
=============================================================================
-Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
=============================================================================
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=============================================================================



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199712122239.RAA08542>