From owner-freebsd-current Fri Nov 28 20:10:33 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id UAA01280 for current-outgoing; Fri, 28 Nov 1997 20:10:33 -0800 (PST) (envelope-from owner-freebsd-current) Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70]) by hub.freebsd.org (8.8.7/8.8.7) with SMTP id UAA01272 for ; Fri, 28 Nov 1997 20:10:23 -0800 (PST) (envelope-from wpaul@skynet.ctr.columbia.edu) Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id XAA08991 for current@freebsd.org; Fri, 28 Nov 1997 23:11:13 -0500 From: Bill Paul Message-Id: <199711290411.XAA08991@skynet.ctr.columbia.edu> Subject: mmap() + NFS == &*$%@$!!! To: current@freebsd.org Date: Fri, 28 Nov 1997 23:11:11 -0500 (EST) X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-freebsd-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I did a quick install of the Nov. 28th -current snapshot to test a few NIS+ related things and ran into a problem using mmap() with NFS. The problem is that it's apparently not too hard at all to wedge either a process or the entire system when using mmap() with files on a remote NFS filesystem. The NIS+ cache manager is supposed to maintain a cache of recently discovered directory_obj structures and other data; this helps speed up the binding process for NIS+ clients. (The actual binding procedure can be arduous and time-consuming; caching the results saves other processes on the local host from repeating the same binding operations over and over again.) What I wanted to do was describe a hash table structure in terms of RPC language and use rpcgen(1) to create XDR filters so that I could encode the entire mess of binding information maintained internally by nis_cachemgr and dump it straight into a file. There are two ways to do this: one is using the standard I/O package and xdrstdio_create(). This lets you direct the XDR stream to a FILE * stream and thence do disk. The other way is to use mmap() and xdrmem_create(): xdrmem_create() lets you write the XDR stream into a memory buffer. If the memory buffer in question is an mmap()ed region of a file, then the XDR stream will also be written to the file when you msync() or munmap() the buffer. I've been tinkering with using xdrmem_create() and mmap() and so far things seem to work okay as long as the file I use is on a local UFS filesystem. Today I tried it with an NFS filesystem, and almost right away I ran into trouble. I'm trying to map the file in chunks of 4096 bytes (since that's the system page size). Initially, the file doesn't exist, so I try to create it and use ftuncate(2) to expand it to 8192 bytes. As new binding information is obtained, it needs to be written out to the file. Using xdr_sizeof(), I can tell how much file space I will need. If the existing file is big enough, I mmap() it, use the XDR filter to encode the data into the mmap()ed region, then munmap() the region. If the data is too large to fit in the existing file, I use ftruncate() to expand the file to the next 8192-byte chunk, then mmap(), XDR, and munmap() again. One of two things can happen when using NFS: either the sample caching program becomes wedged and can't be killed (ps -alx shows it to be in the 'vmopar' state) or else the process wedges the system and I have to reboot. Using the kernel debugger, I've determined in the latter case that the system wedges because nfs_bioread() becomes stuck in a loop. This case is a bit peculiar. The conditions seem to be as follows: - There's an existing file on disk of 8192 bytes in size. The actual data occupies slightly less than 4096 bytes of this. - The program opens the file, mmap()s it, and uses the XDR filter to read the data from the mmap()ed region and convert it into the original hash table and associated stuff. Once the loading is done, the region is munmap()ed. - The program adds an entry to the table, which makes the total size slightly more than 4096 bytes. - The program uses ftruncate() to adjust the file size. This is actually a mistake here: the data still fits in 8192 bytes so the file size doesn't change. The program then mmap()s the file and calls the XDR filter to start encoding the data into the mmap()ed region. - This is where the system wedges. A stack trace shoes that vm_fault() has led to a call to nfs_getpages(), which in turn calls nfs_bioread(). Nfs_bioread() gets caught in a loop, calling nfs_getcacheblk() over and over again. Somehow or other, nfs_getcacheblk() fails, so nfs_bioread() cals brelse(), then loops around and calls nfs_getcacheblk() again, which fails again, etc... The problem seems to happen when the XDR filter crosses the boundary between the 4096 byte pages. Once it passes 4096 bytes, I think it tries to fault in the second page, and this is where it gets trapped in a loop. Unfortunately, I don't have a sample program yet that duplicates this condition: the test program that triggers it has lots of NIS+ junk in it which I need to strip out. With luck I'll be able to do this over the weekend, but I'll need to go back to campus to test it (the test machine is on one of the labs, and I don't want to wedge it from home since I won't be able to reboot it). I do have a sample program that duplicates the first problem, where the process becomes wedged and unkillable. To test this program, compile it, then cd to an NFS filesystem (it doesn't matter if it's NFS v2 or v3, or what OS the server is running). Run the program, and if your system is like mine, it will hang and refuse to die. I could easily just switch to using xdrstdio_create() and fopen() but that would be giving up too easily: this stuff is supposed to work, and I'm not going to stop making noise about it until it does. Anyone else notice this sort of thing, or am I the only one who's bothered to play with mmap() and NFS at the same time? -Bill -- ============================================================================= -Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu Work: wpaul@ctr.columbia.edu | Center for Telecommunications Research Home: wpaul@skynet.ctr.columbia.edu | Columbia University, New York City ============================================================================= "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" ============================================================================= #include #include #include #include #include #include #include #include #include #define CB_CHUNKSIZE 8192 static char *stuff; static unsigned long stuff_size; static int cachebind_fd; static caddr_t laddr; static unsigned long csize; int nis_cachebind_dump() { unsigned long tsize, fsize; char *ptr; tsize = stuff_size; if (tsize > CB_CHUNKSIZE) fsize = tsize + (tsize % CB_CHUNKSIZE); else fsize = CB_CHUNKSIZE; printf("FSIZE: %d SIZE %d\n", fsize, tsize); printf("MOD: %d\n", tsize % CB_CHUNKSIZE); printf("MOD: %d\n", CB_CHUNKSIZE % tsize); /* unmap the region */ munmap(laddr, csize); /* change file size */ ftruncate(cachebind_fd, fsize); csize = fsize; /* remap */ laddr = mmap(0, csize, PROT_READ|PROT_WRITE, MAP_SHARED, cachebind_fd, 0); bcopy(stuff, laddr, stuff_size); /* unmap again */ munmap(laddr, csize); return(0); } int nis_cachebind_init(fname) char *fname; { cachebind_fd = open(fname, O_RDWR|O_CREAT, 0644); if (cachebind_fd == -1) return(-1); stuff_size = 4000; stuff = calloc(1, stuff_size); nis_cachebind_dump(fname); } int nis_cachebind_load(fname) char *fname; { struct stat st; if (stat(fname, &st) == -1) { if (errno != ENOENT) return(-1); return(nis_cachebind_init(fname)); } csize = st.st_size; cachebind_fd = open(fname, O_RDWR, 0644); if (cachebind_fd == -1) return(-1); laddr = mmap(0, csize, PROT_READ|PROT_WRITE, MAP_SHARED, cachebind_fd, 0); if (laddr == MAP_FAILED) { close(cachebind_fd); return(-1); } stuff = calloc(1, csize); stuff_size = csize; bcopy(laddr, stuff, csize); munmap(laddr, csize); return(0); } main() { char *ptr; int i; nis_cachebind_load("test"); /* dirty the memory */ ptr = stuff; for (i = 0; i < stuff_size; i++) { *ptr = '?'; ptr++; } /* dump */ nis_cachebind_dump("test"); /* make it bigger */ stuff_size += 4000; stuff = realloc(stuff, stuff_size); /* dirty it again */ ptr = stuff; for (i = 0; i < stuff_size; i++) { *ptr = '?'; ptr++; } /* dump */ nis_cachebind_dump("test"); }