From owner-freebsd-stable@FreeBSD.ORG Wed Jul 24 21:26:47 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 05489E34 for ; Wed, 24 Jul 2013 21:26:47 +0000 (UTC) (envelope-from prvs=091745d76a=michael@esosoft.com) Received: from eagle.esosoft.net (eagle.esosoft.net [66.241.144.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E3C8E2CBD for ; Wed, 24 Jul 2013 21:26:46 +0000 (UTC) Received: from [74.100.23.197] (port=7674 helo=michaelimac.castillodelsol.com) by eagle.esosoft.net with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1V26Hs-000Lau-Fc; Wed, 24 Jul 2013 14:07:56 -0700 From: Michael Tratz Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Wed, 24 Jul 2013 14:07:56 -0700 Subject: NFS deadlock on 9.2-Beta1 To: freebsd-stable@freebsd.org Message-Id: <1EBB997A-B4FD-4F13-BFBE-5AFF460B524A@esosoft.com> Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) X-Mailer: Apple Mail (2.1508) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Jul 2013 21:26:47 -0000 Two machines (NFS Server: running ZFS / Client: disk-less), both are = running FreeBSD r253506. The NFS client starts to deadlock processes = within a few hours. It usually gets worse from there on. The processes = stay in "D" state. I haven't been able to reproduce it when I want it to = happen. I only have to wait a few hours until the deadlocks occur when = traffic to the client machine starts to pick up. The only way to fix the = deadlocks is to reboot the client. Even an ls to the path which is = deadlocked, will deadlock ls itself. It's totally random what part of = the file system gets deadlocked. The NFS server itself has no problem at = all to access the files/path when something is deadlocked on the client. Last night I decided to put an older kernel on the system r252025 (June = 20th). The NFS server stayed untouched. So far 0 deadlocks on the client = machine (it should have deadlocked by now). FreeBSD is working hard like = it always does. :-) There are a few changes to the NFS code from the = revision which seems to work until Beta1. I haven't tried to narrow it = down if one of those commits are causing the problem. Maybe someone has = an idea what could be wrong and I can test a patch or if it's something = else, because I'm not a kernel expert. :-) I have run several procstat -kk on the processes including the ls which = deadlocked. You can see them here: http://pastebin.com/1RPnFT6r I have tried to mount the file system with and without nolockd. It = didn't make a difference. Other than that it is mounted with: rw,nfsv3,tcp,noatime,rsize=3D32768,wsize=3D32768 Let me know if you need me to do something else or if some other output = is required. I would have to go back to the problem kernel and wait = until the deadlock occurs to get that information. Thanks for your help, Michael