From owner-freebsd-current Fri Nov 28 21:05:18 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id VAA03792 for current-outgoing; Fri, 28 Nov 1997 21:05:18 -0800 (PST) (envelope-from owner-freebsd-current) Received: from Kitten.mcs.com (Kitten.mcs.com [192.160.127.90]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id VAA03776 for ; Fri, 28 Nov 1997 21:05:05 -0800 (PST) (envelope-from karl@Mars.mcs.net) Received: from Mars.mcs.net (karl@Mars.mcs.net [192.160.127.85]) by Kitten.mcs.com (8.8.5/8.8.2) with ESMTP id XAA02124; Fri, 28 Nov 1997 23:05:04 -0600 (CST) Received: (from karl@localhost) by Mars.mcs.net (8.8.7/8.8.2) id XAA15366; Fri, 28 Nov 1997 23:05:03 -0600 (CST) Message-ID: <19971128230503.28898@mcs.net> Date: Fri, 28 Nov 1997 23:05:03 -0600 From: Karl Denninger To: Bill Paul Cc: current@freebsd.org Subject: Re: mmap() + NFS == &*$%@$!!! References: <199711290411.XAA08991@skynet.ctr.columbia.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.84 In-Reply-To: <199711290411.XAA08991@skynet.ctr.columbia.edu>; from Bill Paul on Fri, Nov 28, 1997 at 11:11:11PM -0500 Sender: owner-freebsd-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Try using "cvs update -r1.41 nfs_bio.c" in your "sys/nfs" directory and rebuild the kernel, then see if the problem goes away. There are some SERIOUS willies in versions of that file after 1.41. -- -- Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin http://www.mcs.net/ | T1's from $600 monthly to FULL DS-3 Service | NEW! K56Flex support on ALL modems Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS Fax: [+1 312 803-4929] | *SPAMBLOCK* Technology now included at no cost On Fri, Nov 28, 1997 at 11:11:11PM -0500, Bill Paul wrote: > I did a quick install of the Nov. 28th -current snapshot to test a few > NIS+ related things and ran into a problem using mmap() with NFS. The > problem is that it's apparently not too hard at all to wedge either a > process or the entire system when using mmap() with files on a remote > NFS filesystem. > > The NIS+ cache manager is supposed to maintain a cache of recently > discovered directory_obj structures and other data; this helps speed > up the binding process for NIS+ clients. (The actual binding procedure > can be arduous and time-consuming; caching the results saves other > processes on the local host from repeating the same binding operations > over and over again.) What I wanted to do was describe a hash table > structure in terms of RPC language and use rpcgen(1) to create XDR > filters so that I could encode the entire mess of binding information > maintained internally by nis_cachemgr and dump it straight into a file. > > There are two ways to do this: one is using the standard I/O package > and xdrstdio_create(). This lets you direct the XDR stream to a FILE * > stream and thence do disk. The other way is to use mmap() and > xdrmem_create(): xdrmem_create() lets you write the XDR stream into > a memory buffer. If the memory buffer in question is an mmap()ed > region of a file, then the XDR stream will also be written to the > file when you msync() or munmap() the buffer. > > I've been tinkering with using xdrmem_create() and mmap() and so far > things seem to work okay as long as the file I use is on a local UFS > filesystem. Today I tried it with an NFS filesystem, and almost right > away I ran into trouble. > > I'm trying to map the file in chunks of 4096 bytes (since that's the > system page size). Initially, the file doesn't exist, so I try to > create it and use ftuncate(2) to expand it to 8192 bytes. As new > binding information is obtained, it needs to be written out to the > file. Using xdr_sizeof(), I can tell how much file space I will need. > If the existing file is big enough, I mmap() it, use the XDR filter > to encode the data into the mmap()ed region, then munmap() the > region. If the data is too large to fit in the existing file, I > use ftruncate() to expand the file to the next 8192-byte chunk, then > mmap(), XDR, and munmap() again. > > One of two things can happen when using NFS: either the sample caching > program becomes wedged and can't be killed (ps -alx shows it to be in > the 'vmopar' state) or else the process wedges the system and I have > to reboot. Using the kernel debugger, I've determined in the latter > case that the system wedges because nfs_bioread() becomes stuck in a > loop. > > This case is a bit peculiar. The conditions seem to be as follows: > > - There's an existing file on disk of 8192 bytes in size. The actual > data occupies slightly less than 4096 bytes of this. > > - The program opens the file, mmap()s it, and uses the XDR filter > to read the data from the mmap()ed region and convert it into > the original hash table and associated stuff. Once the loading > is done, the region is munmap()ed. > > - The program adds an entry to the table, which makes the total size > slightly more than 4096 bytes. > > - The program uses ftruncate() to adjust the file size. This is > actually a mistake here: the data still fits in 8192 bytes so the > file size doesn't change. The program then mmap()s the file and > calls the XDR filter to start encoding the data into the mmap()ed > region. > > - This is where the system wedges. A stack trace shoes that vm_fault() > has led to a call to nfs_getpages(), which in turn calls nfs_bioread(). > Nfs_bioread() gets caught in a loop, calling nfs_getcacheblk() over > and over again. Somehow or other, nfs_getcacheblk() fails, so > nfs_bioread() cals brelse(), then loops around and calls nfs_getcacheblk() > again, which fails again, etc... > > The problem seems to happen when the XDR filter crosses the boundary > between the 4096 byte pages. Once it passes 4096 bytes, I think it > tries to fault in the second page, and this is where it gets trapped > in a loop. > > Unfortunately, I don't have a sample program yet that duplicates > this condition: the test program that triggers it has lots of NIS+ > junk in it which I need to strip out. With luck I'll be able to do > this over the weekend, but I'll need to go back to campus to test it > (the test machine is on one of the labs, and I don't want to wedge > it from home since I won't be able to reboot it). I do have a sample > program that duplicates the first problem, where the process becomes > wedged and unkillable. To test this program, compile it, then cd to > an NFS filesystem (it doesn't matter if it's NFS v2 or v3, or what > OS the server is running). Run the program, and if your system is like > mine, it will hang and refuse to die. > > I could easily just switch to using xdrstdio_create() and fopen() > but that would be giving up too easily: this stuff is supposed to work, > and I'm not going to stop making noise about it until it does. Anyone > else notice this sort of thing, or am I the only one who's bothered > to play with mmap() and NFS at the same time? > > -Bill > > -- > ============================================================================= > -Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu > Work: wpaul@ctr.columbia.edu | Center for Telecommunications Research > Home: wpaul@skynet.ctr.columbia.edu | Columbia University, New York City > ============================================================================= > "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" > ============================================================================= > > #include > #include > #include > #include > #include > #include > #include > #include > #include > > #define CB_CHUNKSIZE 8192 > > static char *stuff; > static unsigned long stuff_size; > > static int cachebind_fd; > static caddr_t laddr; > static unsigned long csize; > > int nis_cachebind_dump() > { > unsigned long tsize, fsize; > char *ptr; > > tsize = stuff_size; > > if (tsize > CB_CHUNKSIZE) > fsize = tsize + (tsize % CB_CHUNKSIZE); > else > fsize = CB_CHUNKSIZE; > > printf("FSIZE: %d SIZE %d\n", fsize, tsize); > printf("MOD: %d\n", tsize % CB_CHUNKSIZE); > printf("MOD: %d\n", CB_CHUNKSIZE % tsize); > > /* unmap the region */ > munmap(laddr, csize); > > /* change file size */ > ftruncate(cachebind_fd, fsize); > csize = fsize; > > /* remap */ > laddr = mmap(0, csize, PROT_READ|PROT_WRITE, > MAP_SHARED, cachebind_fd, 0); > > bcopy(stuff, laddr, stuff_size); > > /* unmap again */ > munmap(laddr, csize); > > return(0); > } > > int nis_cachebind_init(fname) > char *fname; > { > cachebind_fd = open(fname, O_RDWR|O_CREAT, 0644); > if (cachebind_fd == -1) > return(-1); > > stuff_size = 4000; > stuff = calloc(1, stuff_size); > > nis_cachebind_dump(fname); > } > > int nis_cachebind_load(fname) > char *fname; > { > struct stat st; > > if (stat(fname, &st) == -1) { > if (errno != ENOENT) > return(-1); > return(nis_cachebind_init(fname)); > } > > csize = st.st_size; > cachebind_fd = open(fname, O_RDWR, 0644); > if (cachebind_fd == -1) > return(-1); > > laddr = mmap(0, csize, PROT_READ|PROT_WRITE, > MAP_SHARED, cachebind_fd, 0); > > if (laddr == MAP_FAILED) { > close(cachebind_fd); > return(-1); > } > > stuff = calloc(1, csize); > stuff_size = csize; > bcopy(laddr, stuff, csize); > > munmap(laddr, csize); > > return(0); > } > > main() > { > char *ptr; > int i; > > nis_cachebind_load("test"); > > /* dirty the memory */ > > ptr = stuff; > for (i = 0; i < stuff_size; i++) { > *ptr = '?'; > ptr++; > } > > /* dump */ > nis_cachebind_dump("test"); > > /* make it bigger */ > stuff_size += 4000; > stuff = realloc(stuff, stuff_size); > > /* dirty it again */ > ptr = stuff; > for (i = 0; i < stuff_size; i++) { > *ptr = '?'; > ptr++; > } > /* dump */ > nis_cachebind_dump("test"); > }