Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Nov 1997 23:05:03 -0600
From:      Karl Denninger  <karl@mcs.net>
To:        Bill Paul <wpaul@skynet.ctr.columbia.edu>
Cc:        current@freebsd.org
Subject:   Re: mmap() + NFS == &*$%@$!!!
Message-ID:  <19971128230503.28898@mcs.net>
In-Reply-To: <199711290411.XAA08991@skynet.ctr.columbia.edu>; from Bill Paul on Fri, Nov 28, 1997 at 11:11:11PM -0500
References:  <199711290411.XAA08991@skynet.ctr.columbia.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Try using "cvs update -r1.41 nfs_bio.c" in your "sys/nfs" directory and
rebuild the kernel, then see if the problem goes away.

There are some SERIOUS willies in versions of that file after 1.41.

--
-- 
Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin
http://www.mcs.net/          | T1's from $600 monthly to FULL DS-3 Service
			     | NEW! K56Flex support on ALL modems
Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS
Fax:   [+1 312 803-4929]     | *SPAMBLOCK* Technology now included at no cost

On Fri, Nov 28, 1997 at 11:11:11PM -0500, Bill Paul wrote:
> I did a quick install of the Nov. 28th -current snapshot to test a few
> NIS+ related things and ran into a problem using mmap() with NFS. The
> problem is that it's apparently not too hard at all to wedge either a
> process or the entire system when using mmap() with files on a remote
> NFS filesystem.
> 
> The NIS+ cache manager is supposed to maintain a cache of recently
> discovered directory_obj structures and other data; this helps speed
> up the binding process for NIS+ clients. (The actual binding procedure
> can be arduous and time-consuming; caching the results saves other
> processes on the local host from repeating the same binding operations
> over and over again.) What I wanted to do was describe a hash table
> structure in terms of RPC language and use rpcgen(1) to create XDR
> filters so that I could encode the entire mess of binding information
> maintained internally by nis_cachemgr and dump it straight into a file.
> 
> There are two ways to do this: one is using the standard I/O package
> and xdrstdio_create(). This lets you direct the XDR stream to a FILE *
> stream and thence do disk. The other way is to use mmap() and
> xdrmem_create(): xdrmem_create() lets you write the XDR stream into
> a memory buffer. If the memory buffer in question is an mmap()ed
> region of a file, then the XDR stream will also be written to the
> file when you msync() or munmap() the buffer.
> 
> I've been tinkering with using xdrmem_create() and mmap() and so far
> things seem to work okay as long as the file I use is on a local UFS
> filesystem. Today I tried it with an NFS filesystem, and almost right
> away I ran into trouble.
> 
> I'm trying to map the file in chunks of 4096 bytes (since that's the
> system page size). Initially, the file doesn't exist, so I try to
> create it and use ftuncate(2) to expand it to 8192 bytes. As new
> binding information is obtained, it needs to be written out to the
> file. Using xdr_sizeof(), I can tell how much file space I will need.
> If the existing file is big enough, I mmap() it, use the XDR filter
> to encode the data  into the mmap()ed region, then munmap() the
> region. If the data is too large to fit in the existing file, I
> use ftruncate() to expand the file to the next 8192-byte chunk, then
> mmap(), XDR, and munmap() again.
> 
> One of two things can happen when using NFS: either the sample caching
> program becomes wedged and can't be killed (ps -alx shows it to be in
> the 'vmopar' state) or else the process wedges the system and I have
> to reboot. Using the kernel debugger, I've determined in the latter
> case that the system wedges because nfs_bioread() becomes stuck in a
> loop.
> 
> This case is a bit peculiar. The conditions seem to be as follows:
> 
> - There's an existing file on disk of 8192 bytes in size. The actual
>   data occupies slightly less than 4096 bytes of this.
> 
> - The program opens the file, mmap()s it, and uses the XDR filter
>   to read the data from the mmap()ed region and convert it into
>   the original hash table and associated stuff. Once the loading
>   is done, the region is munmap()ed.
> 
> - The program adds an entry to the table, which makes the total size
>   slightly more than 4096 bytes.
> 
> - The program uses ftruncate() to adjust the file size. This is
>   actually a mistake here: the data still fits in 8192 bytes so the
>   file size doesn't change. The program then mmap()s the file and
>   calls the XDR filter to start encoding the data into the mmap()ed
>   region.
> 
> - This is where the system wedges. A stack trace shoes that vm_fault()
>   has led to a call to nfs_getpages(), which in turn calls nfs_bioread().
>   Nfs_bioread() gets caught in a loop, calling nfs_getcacheblk() over
>   and over again. Somehow or other, nfs_getcacheblk() fails, so
>   nfs_bioread() cals brelse(), then loops around and calls nfs_getcacheblk()
>   again, which fails again, etc...
> 
> The problem seems to happen when the XDR filter crosses the boundary
> between the 4096 byte pages. Once it passes 4096 bytes, I think it
> tries to fault in the second page, and this is where it gets trapped
> in a loop.
> 
> Unfortunately, I don't have a sample program yet that duplicates
> this condition: the test program that triggers it has lots of NIS+
> junk in it which I need to strip out. With luck I'll be able to do
> this over the weekend, but I'll need to go back to campus to test it
> (the test machine is on one of the labs, and I don't want to wedge
> it from home since I won't be able to reboot it). I do have a sample
> program that duplicates the first problem, where the process becomes
> wedged and unkillable. To test this program, compile it, then cd to
> an NFS filesystem (it doesn't matter if it's NFS v2 or v3, or what
> OS the server is running). Run the program, and if your system is like
> mine, it will hang and refuse to die.
> 
> I could easily just switch to using xdrstdio_create() and fopen()
> but that would be giving up too easily: this stuff is supposed to work,
> and I'm not going to stop making noise about it until it does. Anyone
> else notice this sort of thing, or am I the only one who's bothered
> to play with mmap() and NFS at the same time?
> 
> -Bill
> 
> -- 
> =============================================================================
> -Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
> Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
> Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
> =============================================================================
>  "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
> =============================================================================
> 
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
> #include <errno.h>
> #include <sys/cdefs.h>
> #include <sys/types.h>
> #include <sys/mman.h>
> #include <sys/fcntl.h>
> #include <sys/stat.h>
> 
> #define CB_CHUNKSIZE		8192
> 
> static char			*stuff;
> static unsigned long		stuff_size;
> 
> static int			cachebind_fd;
> static caddr_t			laddr;
> static unsigned long		csize;
> 
> int nis_cachebind_dump()
> {
> 	unsigned long		tsize, fsize;
> 	char			*ptr;
> 
> 	tsize = stuff_size;
> 
> 	if (tsize > CB_CHUNKSIZE)
> 		fsize = tsize + (tsize % CB_CHUNKSIZE);
> 	else
> 		fsize = CB_CHUNKSIZE;
> 
> 	printf("FSIZE: %d SIZE %d\n", fsize, tsize);
> 	printf("MOD: %d\n", tsize % CB_CHUNKSIZE);
> 	printf("MOD: %d\n", CB_CHUNKSIZE % tsize);
> 
> 	/* unmap the region */
> 	munmap(laddr, csize);
> 
> 	/* change file size */
> 	ftruncate(cachebind_fd, fsize);
> 	csize = fsize;
> 
> 	/* remap */
> 	laddr = mmap(0, csize, PROT_READ|PROT_WRITE,
> 			MAP_SHARED, cachebind_fd, 0);
> 
> 	bcopy(stuff, laddr, stuff_size);
> 
> 	/* unmap again */
> 	munmap(laddr, csize);
> 
> 	return(0);
> }
> 
> int nis_cachebind_init(fname)
> 	char			*fname;
> {
> 	cachebind_fd = open(fname, O_RDWR|O_CREAT, 0644);
> 	if (cachebind_fd == -1)
> 		return(-1);
> 
> 	stuff_size = 4000;
> 	stuff = calloc(1, stuff_size);
> 
> 	nis_cachebind_dump(fname);
> }
> 
> int nis_cachebind_load(fname)
> 	char			*fname;
> {
> 	struct stat		st;
> 
> 	if (stat(fname, &st) == -1) {
> 		if (errno != ENOENT)
> 			return(-1);
> 		return(nis_cachebind_init(fname));
> 	}
> 
> 	csize = st.st_size;
> 	cachebind_fd = open(fname, O_RDWR, 0644);
> 	if (cachebind_fd == -1)
> 		return(-1);
> 
> 	laddr = mmap(0, csize, PROT_READ|PROT_WRITE,
> 			MAP_SHARED, cachebind_fd, 0);
> 
> 	if (laddr == MAP_FAILED) {
> 		close(cachebind_fd);
> 		return(-1);
> 	}
> 
> 	stuff = calloc(1, csize);
> 	stuff_size = csize;
> 	bcopy(laddr, stuff, csize);
> 
> 	munmap(laddr, csize);
> 
> 	return(0);
> }
> 
> main()
> {
> 	char			*ptr;
> 	int			i;
> 
> 	nis_cachebind_load("test");
> 
> 	/* dirty the memory */
> 
> 	ptr = stuff;
> 	for (i = 0; i < stuff_size; i++) {
> 		*ptr = '?';
> 		ptr++;
> 	}
> 
> 	/* dump */
> 	nis_cachebind_dump("test");
> 
> 	/* make it bigger */
> 	stuff_size += 4000;
> 	stuff = realloc(stuff, stuff_size);
> 
> 	/* dirty it again */
> 	ptr = stuff;
> 	for (i = 0; i < stuff_size; i++) {
> 		*ptr = '?';
> 		ptr++;
> 	}
> 	/* dump */
> 	nis_cachebind_dump("test");
> }



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19971128230503.28898>