From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 9 18:11:59 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B4582106567B for ; Wed, 9 Jun 2010 18:11:59 +0000 (UTC) (envelope-from rhfb@akira.stdio.com) Received: from akira.stdio.com (akira.stdio.com [204.152.114.29]) by mx1.freebsd.org (Postfix) with SMTP id 861168FC19 for ; Wed, 9 Jun 2010 18:11:46 +0000 (UTC) Received: from akira (localhost [127.0.0.1]) by akira.stdio.com (Postfix) with SMTP id 9769650815 for ; Wed, 9 Jun 2010 13:52:40 -0400 (EDT) From: rhfb@akira.stdio.com To: freebsd-hackers@freebsd.org Message-Id: <20100609175244.9769650815@akira.stdio.com> Date: Wed, 9 Jun 2010 13:52:40 -0400 (EDT) Subject: NFSD lockup running ESXi 4 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Jun 2010 18:11:59 -0000 I have an AMD64 FreeBSD 8.0 running 8-Stable from around 2010/04/25 19:13:08. ZFS disk, Nfsd flags "-t -n 16", private network exclusive for nfs network, not using jumbo frames, HZ=1000, Device_Polling, Zero_Copy_Sockets, and the following sysctl options: net.inet.tcp.recvspace=232140 net.inet.tcp.sendspace=232140 net.inet.tcp.slowstart_flightsize=159 net.inet.tcp.mssdflt=1460 FreeBSD 6 TB zpool, nfs from Three ESXi 4 (newest patch level 193498) working reliably for months. Added a new ESXi, patched to the newest (Post Update 1) patch level 256968. Added a bunch of VM's, booted them all into the 2008 R2 Server install DVD. Then when attempting to do the installs (in parallel/simultaneously) I started getting the NFS server locking up. NFSD would wedge at 100% CPU in "rc_lo" which I presume is rc_lock? Once wedged, /etc/rc.d/nfsd restart can't kill nfsd. So a reboot is required. A Reboot causes all my active VM's with pending disk writes to have disk errors in the VM (10 second default timeout for disk writes in the VM.) This was very reproducable. Has anyone noticed this problem? Is this an ESXi problem with the newest updates? Is this a problem with NFS on FreeBSD 8?