From owner-freebsd-bugs Tue Nov 30 17:40: 7 1999 Delivered-To: freebsd-bugs@freebsd.org Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21]) by hub.freebsd.org (Postfix) with ESMTP id D885C15A43 for ; Tue, 30 Nov 1999 17:40:00 -0800 (PST) (envelope-from gnats@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.9.3/8.9.2) id RAA66072; Tue, 30 Nov 1999 17:40:00 -0800 (PST) (envelope-from gnats@FreeBSD.org) Received: by hub.freebsd.org (Postfix, from userid 32767) id 77E4615AA7; Tue, 30 Nov 1999 17:34:46 -0800 (PST) Message-Id: <19991201013446.77E4615AA7@hub.freebsd.org> Date: Tue, 30 Nov 1999 17:34:46 -0800 (PST) From: hostetlb@agcs.com To: freebsd-gnats-submit@freebsd.org X-Send-Pr-Version: www-1.0 Subject: kern/15195: Kernel hangs during concurrent NFS writes Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >Number: 15195 >Category: kern >Synopsis: Kernel hangs during concurrent NFS writes >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Nov 30 17:40:00 PST 1999 >Closed-Date: >Last-Modified: >Originator: Bly Hostetler >Release: 2.2.8 through 3.3 >Organization: AG Communication Systems >Environment: FreeBSD calvin-t1.labs.agcs.com 2.2.8-RELEASE FreeBSD 2.2.8-RELEASE #0: Tue Nov 30 09:51:56 GMT 1999 root@calvin-t1.labs.agcs.com:/usr/src/sys/compile/AGCS i386 >Description: We have multiple processes that have opened the same file with mode "a+" (append, read-access). The file is located on an NFS mounted partition (easiest to reproduce using UDP, NFSv3; but we have seen it using UDP or TCP/IP, and NFSv3 or NFSv2). When the processes start writing to that file at the same time, the OS get locked in the kernel; no login prompts, no shell prompts, no i/o, nothing. Trapping to DDB (ctrl-alt-esc) shows that the kernel is inside _nfs_write; we have trapped it at several locations, but always within _nfs_write. Following is an example "trace" from DDB : _biodone(f418d200,f418d200,f22bf000,f22dc600,3) at _boidone 0x2e6 _nfs_doio(f418d200,f22df000,f22dc600,1,f418d200) at _nfs_doio 0x4c5 _nfs_strategy(efbffddc) at _nfs_strategy 0x68 _nfs_writebp(f418d200,1,efbffec4,f0163000,efbffe50) at _nfs_writebp 0x125 _nfs_bwrite(efbff50) at _nfs_bwrite 0x10 _nfs_write(efbffee08,f02564e0,1,efbff94,2) at _nfs_write 0x648 _vn_write(f25d6f80,efbfff34,f22bf000,f02564e0,f22c600) at _vn_write 0x93 _write(f220c600,efbff94,efbff84,0,5167) at _write 0x97 _syscall We mounted the directory using : /sbin/mount_nfs -U -c plato-t1:/u/FreeBSD/decm /usr/home/decm We are running on a P-II 450, 512 Meg RAM, 1024 Swap, 100 Mbit ethernet. Although we initially detected the problem in FreeBSD 2.2.8, we subsequently loaded FreeBSD 3.3, and had the same results. The sample code below was created to simulate third-party software that was actually the original cause of the lock-ups. We understand that this form of concurrent writes to a file is asking for trouble, but we did not have control over the problem code. The sample code below worked when tried on SCO Unix using the same NFS mounted directory (and file). We have identified a fix to the C source code, and passed this on to our vender, but we also believe this problem should be corrected in the OS's NFS layer. >How-To-Repeat: The following program can be used to create the problem. By default it spawns 10 writes, but can spawn any number. We have only used 10 and 50, and the problem occurs immediately (within 5-10 writes to the file.) *** BEGIN C CODE *** #include main(int argc, char **argv) { int forks = 10; int writer_number = 1; FILE *fp1; int i = 0; if (argc > 1) { forks = atoi(argv[1]); } fprintf(stderr, "Spawning %d writers\n", forks); while (--forks > 0) { if (!fork()) { /* Child */ break; } writer_number++; } fprintf(stderr, "Writer number %d\n", writer_number); while (1) { while (!(fp1 = fopen("F1", "a+"))); fprintf(fp1, "%d %d\n", writer_number, i); fflush(fp1); fclose(fp1); i++; } } *** END OF C CODE *** The problem lies in the fact that each process is opening the file for append, but is not locking the file for exclusive access. The user-level work-around to the above code is to lock the file for exclusive access. Adding the following lines (in the locations indicated) : /* ... at the top of the file */ #include ... /* After the "while (!(...fopen(...)));" add this */ flock(fileno(fp1), LOCK_EX); >Fix: >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message