Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Nov 1999 17:34:46 -0800 (PST)
From:      hostetlb@agcs.com
To:        freebsd-gnats-submit@freebsd.org
Subject:   kern/15195: Kernel hangs during concurrent NFS writes
Message-ID:  <19991201013446.77E4615AA7@hub.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         15195
>Category:       kern
>Synopsis:       Kernel hangs during concurrent NFS writes
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Nov 30 17:40:00 PST 1999
>Closed-Date:
>Last-Modified:
>Originator:     Bly Hostetler
>Release:        2.2.8 through 3.3
>Organization:
AG Communication Systems
>Environment:
FreeBSD calvin-t1.labs.agcs.com 2.2.8-RELEASE FreeBSD 2.2.8-RELEASE #0: Tue Nov
30 09:51:56 GMT 1999     root@calvin-t1.labs.agcs.com:/usr/src/sys/compile/AGCS
 i386
>Description:
We have multiple processes that have opened the same file with mode "a+"
(append, read-access).  The file is located on an NFS mounted partition
(easiest to reproduce using UDP, NFSv3; but we have seen it using UDP
or TCP/IP, and NFSv3 or NFSv2).  When the processes start writing to that
file at the same time, the OS get locked in the kernel; no login prompts,
no shell prompts, no i/o, nothing.

Trapping to DDB (ctrl-alt-esc) shows that the kernel is inside _nfs_write;
we have trapped it at several locations, but always within _nfs_write.
Following is an example "trace" from DDB :

_biodone(f418d200,f418d200,f22bf000,f22dc600,3) at _boidone 0x2e6
_nfs_doio(f418d200,f22df000,f22dc600,1,f418d200) at _nfs_doio 0x4c5
_nfs_strategy(efbffddc) at _nfs_strategy 0x68
_nfs_writebp(f418d200,1,efbffec4,f0163000,efbffe50) at _nfs_writebp 0x125
_nfs_bwrite(efbff50) at _nfs_bwrite 0x10
_nfs_write(efbffee08,f02564e0,1,efbff94,2) at _nfs_write 0x648
_vn_write(f25d6f80,efbfff34,f22bf000,f02564e0,f22c600) at _vn_write 0x93
_write(f220c600,efbff94,efbff84,0,5167) at _write 0x97
_syscall

We mounted the directory using :

/sbin/mount_nfs -U -c plato-t1:/u/FreeBSD/decm /usr/home/decm

We are running on a P-II 450, 512 Meg RAM, 1024 Swap, 100 Mbit ethernet.

Although we initially detected the problem in FreeBSD 2.2.8, we
subsequently loaded FreeBSD 3.3, and had the same results.

The sample code below was created to simulate third-party software that
was actually the original cause of the lock-ups.  We understand that this
form of concurrent writes to a file is asking for trouble, but we did
not have control over the problem code.  The sample code below worked
when tried on SCO Unix using the same NFS mounted directory (and file).

We have identified a fix to the C source code, and passed this on to
our vender, but we also believe this problem should be corrected in the
OS's NFS layer.
>How-To-Repeat:
The following program can be used to create the problem.  By default it
spawns 10 writes, but can spawn any number.  We have only used 10 and 50,
and the problem occurs immediately (within 5-10 writes to the file.)

*** BEGIN C CODE ***

#include <stdio.h>

main(int argc,
     char **argv)
{
    int forks = 10;
    int writer_number = 1;
    FILE *fp1;
    int i = 0;

    if (argc > 1) {
        forks = atoi(argv[1]);
    }
    fprintf(stderr, "Spawning %d writers\n", forks);

    while (--forks > 0) {
        if (!fork()) {
            /* Child */
            break;
        }
        writer_number++;
    }

    fprintf(stderr, "Writer number %d\n", writer_number);

    while (1) {
        while (!(fp1 = fopen("F1", "a+")));

        fprintf(fp1, "%d %d\n", writer_number, i);
        fflush(fp1);
        fclose(fp1);

        i++;
    }
}

*** END OF C CODE ***

The problem lies in the fact that each process is opening the file for
append, but is not locking the file for exclusive access.

The user-level work-around to the above code is to lock the file for
exclusive access.  Adding the following lines (in the locations
indicated) :

/* ... at the top of the file */
#include <sys/file.h>

...

        /* After the "while (!(...fopen(...)));" add this */
        flock(fileno(fp1), LOCK_EX);

>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19991201013446.77E4615AA7>