From owner-freebsd-bugs@FreeBSD.ORG Wed Jun 28 19:50:19 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6BBDA16A4E6 for ; Wed, 28 Jun 2006 19:50:19 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5824F43D67 for ; Wed, 28 Jun 2006 19:50:18 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k5SJoICA098781 for ; Wed, 28 Jun 2006 19:50:18 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k5SJoIPw098780; Wed, 28 Jun 2006 19:50:18 GMT (envelope-from gnats) Resent-Date: Wed, 28 Jun 2006 19:50:18 GMT Resent-Message-Id: <200606281950.k5SJoIPw098780@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Helio Luchtenberg Junior Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DCC7916A608 for ; Wed, 28 Jun 2006 19:43:31 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id F0A7F44537 for ; Wed, 28 Jun 2006 18:47:31 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.13.1/8.13.1) with ESMTP id k5SIlVvt029833 for ; Wed, 28 Jun 2006 18:47:31 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.13.1/8.13.1/Submit) id k5SIlVql029832; Wed, 28 Jun 2006 18:47:31 GMT (envelope-from nobody) Message-Id: <200606281847.k5SIlVql029832@www.freebsd.org> Date: Wed, 28 Jun 2006 18:47:31 GMT From: Helio Luchtenberg Junior To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-2.3 Cc: Subject: kern/99588: UFS2 filesystems hang when doing "fsck -B" or "dump -L" or "mksnapffs" in a moderated I/O filesystem with many file locks/unlocks X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Jun 2006 19:50:19 -0000 >Number: 99588 >Category: kern >Synopsis: UFS2 filesystems hang when doing "fsck -B" or "dump -L" or "mksnapffs" in a moderated I/O filesystem with many file locks/unlocks >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Jun 28 19:50:17 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Helio Luchtenberg Junior >Release: FreeBSD 5.4p12 >Organization: Viamidia Tecnologia S.A. >Environment: FreeBSD freeteste.viamidia.com 5.4-RELEASE-p12 FreeBSD 5.4-RELEASE-p12 #1: Wed Mar 8 16:08:29 UTC 2006 :/usr/obj/usr/src/sys/VIAMIDIA i386 >Description: We have noted filesystem "freezing" when creating filesystem snapshots (mksnap_ffs), or fsck'ing in background (fsck -B), or dumping (dump -L) on a filesystem of type UFS2 (default for FreeBSD 5.x) with a moderate I/O and many processes doing intensive file "locking/unlocking" on that filesystem. After the filesystem freezes, we could see that no activity was being done on the processes trying to access files on that filesystem. These processes are kept in the "D" state (Disk wait) forever. This seems to be a deadlock because the processes that are keeping locks on that filesystem can not be killed/aborted in any way (even with kill -9) and all they are seem in a "D" state. The "iostat" showed no disk activity on that filesystem after some time. We could reproduce the problem, see the description of how to do this below. >How-To-Repeat: We have created three directories below mountpoint (a UFS2 filesystem) "/jails": /jails/teste1, /jails/teste2, /jails/teste3. In each of these directories we have put 120 files. After running six instances of the program below, being two copies of it modified as to point to the files on "/jails/teste1", and other two copies of it modified as to point to the files on "/jails/teste2" and finally more two copies of it modified as to point to "/jails/teste3". When these six copies of the program below are run, and we try to create a filesystem snapshot of that filesystem (/jails), after some time the filesystem hangs and no other activity can be seen on it. All six copies of the program are found to be in "D" state, waiting for a disk operation to complete. The only solution found to restore the filesystem to a running state is to reboot the server. --------------------------------- #include #include #include #include #include struct dirent *dp; DIR *dirp; char name[4096]; int arq; main(int argc, char *argv[]) { arq=0; while(1) { dirp = opendir("/jails/teste1"); arq=0; dp=readdir(dirp); /* skip directory "." */ dp=readdir(dirp); /* skip directory ".." */ while ((dp = readdir(dirp)) != NULL) { sprintf(name,"/jails/teste1/%s",dp->d_name); arq=open(name,O_RDWR); flock(arq,LOCK_EX); close(arq); } (void)closedir(dirp); dirp = opendir("/jails/teste1"); arq=0; dp=readdir(dirp); /* skip directory "." */ dp=readdir(dirp); /* skip directory ".." */ while ((dp = readdir(dirp)) != NULL) { sprintf(name,"/jails/teste1/%s",dp->d_name); arq=open(name,O_RDWR); flock(arq,LOCK_UN); close(arq); } (void)closedir(dirp); } } ------------------------------- >Fix: >Release-Note: >Audit-Trail: >Unformatted: