Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Nov 1997 23:11:11 -0500 (EST)
From:      Bill Paul <wpaul@skynet.ctr.columbia.edu>
To:        current@freebsd.org
Subject:   mmap() + NFS == &*$%@$!!!
Message-ID:  <199711290411.XAA08991@skynet.ctr.columbia.edu>

next in thread | raw e-mail | index | archive | help
I did a quick install of the Nov. 28th -current snapshot to test a few
NIS+ related things and ran into a problem using mmap() with NFS. The
problem is that it's apparently not too hard at all to wedge either a
process or the entire system when using mmap() with files on a remote
NFS filesystem.

The NIS+ cache manager is supposed to maintain a cache of recently
discovered directory_obj structures and other data; this helps speed
up the binding process for NIS+ clients. (The actual binding procedure
can be arduous and time-consuming; caching the results saves other
processes on the local host from repeating the same binding operations
over and over again.) What I wanted to do was describe a hash table
structure in terms of RPC language and use rpcgen(1) to create XDR
filters so that I could encode the entire mess of binding information
maintained internally by nis_cachemgr and dump it straight into a file.

There are two ways to do this: one is using the standard I/O package
and xdrstdio_create(). This lets you direct the XDR stream to a FILE *
stream and thence do disk. The other way is to use mmap() and
xdrmem_create(): xdrmem_create() lets you write the XDR stream into
a memory buffer. If the memory buffer in question is an mmap()ed
region of a file, then the XDR stream will also be written to the
file when you msync() or munmap() the buffer.

I've been tinkering with using xdrmem_create() and mmap() and so far
things seem to work okay as long as the file I use is on a local UFS
filesystem. Today I tried it with an NFS filesystem, and almost right
away I ran into trouble.

I'm trying to map the file in chunks of 4096 bytes (since that's the
system page size). Initially, the file doesn't exist, so I try to
create it and use ftuncate(2) to expand it to 8192 bytes. As new
binding information is obtained, it needs to be written out to the
file. Using xdr_sizeof(), I can tell how much file space I will need.
If the existing file is big enough, I mmap() it, use the XDR filter
to encode the data  into the mmap()ed region, then munmap() the
region. If the data is too large to fit in the existing file, I
use ftruncate() to expand the file to the next 8192-byte chunk, then
mmap(), XDR, and munmap() again.

One of two things can happen when using NFS: either the sample caching
program becomes wedged and can't be killed (ps -alx shows it to be in
the 'vmopar' state) or else the process wedges the system and I have
to reboot. Using the kernel debugger, I've determined in the latter
case that the system wedges because nfs_bioread() becomes stuck in a
loop.

This case is a bit peculiar. The conditions seem to be as follows:

- There's an existing file on disk of 8192 bytes in size. The actual
  data occupies slightly less than 4096 bytes of this.

- The program opens the file, mmap()s it, and uses the XDR filter
  to read the data from the mmap()ed region and convert it into
  the original hash table and associated stuff. Once the loading
  is done, the region is munmap()ed.

- The program adds an entry to the table, which makes the total size
  slightly more than 4096 bytes.

- The program uses ftruncate() to adjust the file size. This is
  actually a mistake here: the data still fits in 8192 bytes so the
  file size doesn't change. The program then mmap()s the file and
  calls the XDR filter to start encoding the data into the mmap()ed
  region.

- This is where the system wedges. A stack trace shoes that vm_fault()
  has led to a call to nfs_getpages(), which in turn calls nfs_bioread().
  Nfs_bioread() gets caught in a loop, calling nfs_getcacheblk() over
  and over again. Somehow or other, nfs_getcacheblk() fails, so
  nfs_bioread() cals brelse(), then loops around and calls nfs_getcacheblk()
  again, which fails again, etc...

The problem seems to happen when the XDR filter crosses the boundary
between the 4096 byte pages. Once it passes 4096 bytes, I think it
tries to fault in the second page, and this is where it gets trapped
in a loop.

Unfortunately, I don't have a sample program yet that duplicates
this condition: the test program that triggers it has lots of NIS+
junk in it which I need to strip out. With luck I'll be able to do
this over the weekend, but I'll need to go back to campus to test it
(the test machine is on one of the labs, and I don't want to wedge
it from home since I won't be able to reboot it). I do have a sample
program that duplicates the first problem, where the process becomes
wedged and unkillable. To test this program, compile it, then cd to
an NFS filesystem (it doesn't matter if it's NFS v2 or v3, or what
OS the server is running). Run the program, and if your system is like
mine, it will hang and refuse to die.

I could easily just switch to using xdrstdio_create() and fopen()
but that would be giving up too easily: this stuff is supposed to work,
and I'm not going to stop making noise about it until it does. Anyone
else notice this sort of thing, or am I the only one who's bothered
to play with mmap() and NFS at the same time?

-Bill

-- 
=============================================================================
-Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
=============================================================================
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=============================================================================

#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <sys/cdefs.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/fcntl.h>
#include <sys/stat.h>

#define CB_CHUNKSIZE		8192

static char			*stuff;
static unsigned long		stuff_size;

static int			cachebind_fd;
static caddr_t			laddr;
static unsigned long		csize;

int nis_cachebind_dump()
{
	unsigned long		tsize, fsize;
	char			*ptr;

	tsize = stuff_size;

	if (tsize > CB_CHUNKSIZE)
		fsize = tsize + (tsize % CB_CHUNKSIZE);
	else
		fsize = CB_CHUNKSIZE;

	printf("FSIZE: %d SIZE %d\n", fsize, tsize);
	printf("MOD: %d\n", tsize % CB_CHUNKSIZE);
	printf("MOD: %d\n", CB_CHUNKSIZE % tsize);

	/* unmap the region */
	munmap(laddr, csize);

	/* change file size */
	ftruncate(cachebind_fd, fsize);
	csize = fsize;

	/* remap */
	laddr = mmap(0, csize, PROT_READ|PROT_WRITE,
			MAP_SHARED, cachebind_fd, 0);

	bcopy(stuff, laddr, stuff_size);

	/* unmap again */
	munmap(laddr, csize);

	return(0);
}

int nis_cachebind_init(fname)
	char			*fname;
{
	cachebind_fd = open(fname, O_RDWR|O_CREAT, 0644);
	if (cachebind_fd == -1)
		return(-1);

	stuff_size = 4000;
	stuff = calloc(1, stuff_size);

	nis_cachebind_dump(fname);
}

int nis_cachebind_load(fname)
	char			*fname;
{
	struct stat		st;

	if (stat(fname, &st) == -1) {
		if (errno != ENOENT)
			return(-1);
		return(nis_cachebind_init(fname));
	}

	csize = st.st_size;
	cachebind_fd = open(fname, O_RDWR, 0644);
	if (cachebind_fd == -1)
		return(-1);

	laddr = mmap(0, csize, PROT_READ|PROT_WRITE,
			MAP_SHARED, cachebind_fd, 0);

	if (laddr == MAP_FAILED) {
		close(cachebind_fd);
		return(-1);
	}

	stuff = calloc(1, csize);
	stuff_size = csize;
	bcopy(laddr, stuff, csize);

	munmap(laddr, csize);

	return(0);
}

main()
{
	char			*ptr;
	int			i;

	nis_cachebind_load("test");

	/* dirty the memory */

	ptr = stuff;
	for (i = 0; i < stuff_size; i++) {
		*ptr = '?';
		ptr++;
	}

	/* dump */
	nis_cachebind_dump("test");

	/* make it bigger */
	stuff_size += 4000;
	stuff = realloc(stuff, stuff_size);

	/* dirty it again */
	ptr = stuff;
	for (i = 0; i < stuff_size; i++) {
		*ptr = '?';
		ptr++;
	}
	/* dump */
	nis_cachebind_dump("test");
}



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199711290411.XAA08991>