Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Jun 2006 18:47:31 GMT
From:      Helio Luchtenberg Junior <hlj@viamidia.net>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/99588: UFS2 filesystems hang when doing "fsck -B" or "dump -L" or "mksnapffs" in a moderated I/O filesystem with many file locks/unlocks
Message-ID:  <200606281847.k5SIlVql029832@www.freebsd.org>
Resent-Message-ID: <200606281950.k5SJoIPw098780@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         99588
>Category:       kern
>Synopsis:       UFS2 filesystems hang when doing "fsck -B" or "dump -L" or "mksnapffs" in a moderated I/O filesystem with many file locks/unlocks
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jun 28 19:50:17 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator:     Helio Luchtenberg Junior
>Release:        FreeBSD 5.4p12
>Organization:
Viamidia Tecnologia S.A.
>Environment:
FreeBSD freeteste.viamidia.com 5.4-RELEASE-p12 FreeBSD 5.4-RELEASE-p12 #1: Wed Mar  8 16:08:29 UTC 2006     :/usr/obj/usr/src/sys/VIAMIDIA  i386

>Description:
We have noted filesystem "freezing" when creating filesystem snapshots (mksnap_ffs), or fsck'ing in background (fsck -B), or dumping (dump -L) on a filesystem of type UFS2 (default for FreeBSD 5.x) with a moderate I/O and many processes doing intensive file "locking/unlocking" on that filesystem.

After the filesystem freezes, we could see that no activity was being done on the processes trying to access files on that filesystem.  These processes are kept in the "D" state (Disk wait) forever.  This seems to be a deadlock because the processes that are keeping locks on that filesystem can not be killed/aborted in any way (even with kill -9) and all they are seem in a "D" state.  The "iostat" showed no disk activity on that filesystem after some time.

We could reproduce the problem, see the description of how to do this below.
>How-To-Repeat:
We have created three directories below mountpoint (a UFS2 filesystem) "/jails": /jails/teste1, /jails/teste2, /jails/teste3.  In each of these directories we have put 120 files.  After running six instances of the program below, being two copies of it modified as to point to the files on "/jails/teste1", and other two copies of it modified as to point to the files on "/jails/teste2" and finally more two copies of it modified as to point to "/jails/teste3".  When these six copies of the program below are run, and we try to create a filesystem snapshot of that filesystem (/jails), after some time the filesystem hangs and no other activity can be seen on it.  All six copies of the program are found to be in "D" state, waiting for a disk operation to complete.  The only solution found to restore the filesystem to a running state is to reboot the server.

---------------------------------
#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/file.h>
#include <dirent.h>

struct dirent *dp;
DIR *dirp;
char name[4096];
int arq;

main(int argc, char *argv[])
{
        arq=0;
        while(1)
        {
           dirp = opendir("/jails/teste1"); arq=0;
           dp=readdir(dirp); /* skip directory "." */
           dp=readdir(dirp); /* skip directory ".." */

           while ((dp = readdir(dirp)) != NULL)
           {
                sprintf(name,"/jails/teste1/%s",dp->d_name);
                arq=open(name,O_RDWR);
                flock(arq,LOCK_EX);
                close(arq);
           }
           (void)closedir(dirp);
           dirp = opendir("/jails/teste1"); arq=0;
           dp=readdir(dirp); /* skip directory "." */
           dp=readdir(dirp); /* skip directory ".." */

           while ((dp = readdir(dirp)) != NULL)
           {
                sprintf(name,"/jails/teste1/%s",dp->d_name);
                arq=open(name,O_RDWR);
                flock(arq,LOCK_UN);
                close(arq);
           }
           (void)closedir(dirp);
        }
}
-------------------------------
>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200606281847.k5SIlVql029832>