From owner-freebsd-fs@FreeBSD.ORG Sat Jul 21 11:50:33 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5B3F7106566B for ; Sat, 21 Jul 2012 11:50:33 +0000 (UTC) (envelope-from rafi@ugcs.caltech.edu) Received: from fox.seas.upenn.edu (foxv6.seas.upenn.edu [IPv6:2607:f470:8:64:5ea5::e]) by mx1.freebsd.org (Postfix) with ESMTP id F3A148FC08 for ; Sat, 21 Jul 2012 11:50:32 +0000 (UTC) Received: from [158.130.106.226] (seas737.wireless-pennnet.upenn.edu [158.130.106.226]) (authenticated bits=0) by fox.seas.upenn.edu (8.14.5/8.14.3) with ESMTP id q6LBoVov022873 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NOT) for ; Sat, 21 Jul 2012 07:50:31 -0400 Message-ID: <500A9787.5060109@ugcs.caltech.edu> Date: Sat, 21 Jul 2012 07:50:31 -0400 From: Rafi Rubin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120510 Icedove/10.0.4 MIME-Version: 1.0 To: freebsd-fs@freebsd.org X-Enigmail-Version: 1.4 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.7.7855, 1.0.260, 0.0.0000 definitions=2012-07-21_03:2012-07-20, 2012-07-21, 1970-01-01 signatures=0 X-PP-Spam-Details: rule=add_spam_details policy=default score=0 spamscore=0 ipscore=0 suspectscore=1 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=6.0.2-1203120001 definitions=main-1207210082 Subject: ZFS deadlock X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Jul 2012 11:50:33 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have a small server with a mirrored pair of hard drives that handles generally light loads for a number of linux machines over nfs v3. I've been seeing some deadlocks (and possibly worse) lately. I've narrowed down the freezing to a simple test run simultaneously on a number of the client machines: mount host:/food/bar ls bar/.zfs/snapshot When I do that, all the client machines hang on any access to the server. On the server, only the volume "bar" hangs, the rest are fine. This occurs even when there is no other load on the server. Also, after rebooting the server, the client machines eventually complete the ls and behave normally. I think this doesn't happen if the directory is cached, either in memory or on an L2 ssd, but I need to rerun some tests. For now the machine seems to be stable with nfsd limited to a single thread. Any help debugging would be appreciated, Rafi -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJQCpeEAAoJEPILXytRLnK22rAQAJpEc9hVdfrNaqJLbqf1mwiJ kzLAh2FHmhzWvomA6WBdpR5JZ535vkQjcd3gS9KdKjmX+KxozTGM+CiX8quJwVef nLOXqY5KNqDLQVpfJtMbo+XybM6nmLN8YImX/wIAbVXtRYwOPLP/9yk5aYywZnPN MFhQrh/PnMydUYJwYfZHdEGACnVJnPWord9p9OD7O7FxsItMPMDNRZPrLr2BEHPm LheVqXSXsvBe8w/CtQXoyCvWRr9J+bA2/Djid6RNpUMH7S8f6+xKzlYxrivVyFjY s88NiCbDrOefNEJMldsBeSrC/G3ZTM2iMZMr7KV9hqGbp9rJvth9KZsoGkpjyA9F //z1LNjUVoc+ol1z0oooNKpPSEvYCU/21mTZ9lZ0p9FeRSti2jH33zyYXmv5N8TO 6pjkO14TpBby/j1uNI3CxKGp3XN/o67AkpCBcJFQqfmFC2o0wx6f1PTqbAvSIwR3 M+srT/plBIx0CI53WYMnSunaw126ZxbiT9UiK+7OoAwXT5v62Wi7bWU0Jd5Ezbvc 8Bo+RbkVnEuLW0xTr0gbEXJy2m3/Gmq+G570D7bEqiXdZG3gpcehXEZ9tCStQ1ZM QNtf7lFmgamsBETyiaIbj9U0fDd/1Uddp5dHjxccF5qOBs7CNnCdJ+YAcpkT+Rup KMPYRNrdyABggVIrkkPg =4BS7 -----END PGP SIGNATURE-----