From owner-freebsd-stable@FreeBSD.ORG Fri Oct 21 20:08:56 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4201A106564A for ; Fri, 21 Oct 2011 20:08:56 +0000 (UTC) (envelope-from subbsd@gmail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id F40718FC0C for ; Fri, 21 Oct 2011 20:08:55 +0000 (UTC) Received: by vcbfo13 with SMTP id fo13so5666797vcb.13 for ; Fri, 21 Oct 2011 13:08:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=Q4CXmt28iRppaeiBMeKXq1KIWNbC2C5CC0in4Cy3T9M=; b=iHnQqk2VjZcJbesFWFX3DCqoaxkWznOU92Ixa1jmfffDc+SqRlJUnhykuZlZ2E3hN5 P+eKEe/i6Vs+kMDSnwK0QCIIHSknaKTh4Sr1o2n10i9ABk9Mn7ogAfm6z6aVPnZ0oQcP LE+nXroyOlekbp5ImNJlXABTJbVGZLc7u7KqE= MIME-Version: 1.0 Received: by 10.220.106.206 with SMTP id y14mr1164604vco.109.1319227735002; Fri, 21 Oct 2011 13:08:55 -0700 (PDT) Received: by 10.220.160.197 with HTTP; Fri, 21 Oct 2011 13:08:54 -0700 (PDT) Date: Sat, 22 Oct 2011 00:08:54 +0400 Message-ID: From: Subbsd To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org Subject: VFS problem with ?fcntl SETLK? and nullfs X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Oct 2011 20:08:56 -0000 Hi I found a bad issue in FreeBSD mounts nullfs file system, which may appear in the random. Initially, I get problems on FreeBSD-current on the host that have a large number JAIL at the time when they start. Handbook scenario: 1) have readonly base (for example /usr/jails/base) 2) have write area for jail personal data (for example: /usr/jails/j1data/{home,var,local,...}) 3) mount RO base to new jail location, then mount RW part data above RO In some cases, i watched the freeze of the system when working nullfs mount, but could not find a reason. On a test environment I have tried to simulate mount_nullfs with different types of actions by the source directory: - through dd(1) to make an huge oveload by read - does not affect - through dd(1) to make an huge overload by write - does not affect - through script to delete, create random-files in large numbers - does not affect but now I can easily with a 100% guarantee show the problem - it is easily obtained by working with "svn cleanup" action. For example on the directory /usr/src obtained from SVN. If start in /usr/src svn cleanup and at the same time try to mount_nullfs the problem appears. As far as I can see, cleanup makes frequent lock files. It seems to me, who some of the lock is simply not true and is inherited by a deadlock. I wrote sample scripts simulating the problem. I did a rotation mount-ro + mount-rw specifically - is the repetition of the way described in the handbook section of jail. Since the problem can appear in random moment, I made an infinite loop. But I am getting the problem is usually the first-pass. Here is it: -------/cut/----- #!/bin/sh SRCROOT="/usr/src" DSTROOT="/usr/nullfstest" ITER=`seq 100` MOUNTO=`find ${SRCROOT} -type d -maxdepth 1 -exec basename {} \;` [ -d "${DSTROOT}" ] || mkdir $DSTROOT mount_subdir() { for mto in ${MOUNTO}; do if [ -d "${1}/$mto" ]; then mount -orw -t nullfs /bin ${1}/${mto} fi done } cd ${SRCROOT} while [ 1 ]; do echo "Mount phase" lockf -s -t0 /tmp/svn.lock svn cleanup & for iter in $ITER; do DST="${DSTROOT}/${iter}" [ -d "${DST}" ] || mkdir ${DST} mount -oro -t nullfs ${SRCROOT} ${DST} mount_subdir ${DST} done echo "Unmount phase" mount -t nullfs |awk {'printf "umount -f "$3"\n"'} |sh done -------/end of cut/----- Last syscall I can see this svn cleanup is: fcntl(3,F_SETLK,0x7fffffffc9b0) where 3 - fd of some \.svn/file. looks like in action this way - the system (kernel) works. but if the process or your session will affect an action in the source directory (in this example - /usr/src), for example: cd /usr/src fstat /usr/src/* ls /usr/src/ - Get filesystem deadlock. In addition, the system in this state does not reboot without help - system do not return from free buffer to storage stage. in FreeBSD 9.0 RC1 bug exists. PS: An important detail - I could not get the problem on FreeBSD running under a virtual machine (VirtualBox) - maybe due to the tick / hz.kern issue? PS2: what file system - does not matter. I get the problem on ZFS as well as for UFS Please check this informatio. it seems that this is serious Thanks.