From owner-freebsd-stable@FreeBSD.ORG Thu Jun 1 22:02:53 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C243416A4C9; Thu, 1 Jun 2006 22:02:53 +0000 (UTC) (envelope-from mark@islandnet.com) Received: from outgoing.islandnet.com (outgoing.islandnet.com [199.175.106.125]) by mx1.FreeBSD.org (Postfix) with ESMTP id 83ED543D46; Thu, 1 Jun 2006 22:02:53 +0000 (GMT) (envelope-from mark@islandnet.com) Received: from [199.175.106.57] (helo=cluster.islandnet.com) by outgoing.islandnet.com with ESMTP id 1FlvFh-000Lmy-qI ; Thu, 01 Jun 2006 15:02:49 -0700 Received: from [199.175.106.221] (port=24616 helo=helpdesk.islandnet.com) by cluster08.islandnet.com with SMTP id 1FlvFl-000Pin-QJ ; Thu, 01 Jun 2006 15:02:53 -0700 Date: Thu, 1 Jun 2006 15:02:53 -0700 Message-ID: <447f640d-11663@helpdesk.islandnet.com> From: Mark Morley To: freebsd-stable@freebsd.org,freebsd-fs@freebsd.org Content-type: text/plain MIME-Version: 1.0 X-Priority: 3 X-Mailer: Helpdesk Webmail (http://helpdesk.islandnet.com) X-Originating-IP: [199.175.106.243] X-GeoIP: CA Canada Cc: Subject: NFS processes locking up!! X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Mark Morley List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2006 22:03:01 -0000 Hi all, We have an NFS server (amd64) running FreeBSD 6.1-STABLE. It serves a dozen or so clients which are a mix of FreeBSD 4.11 and 6.1-STABLE. All NFS traffic is on a dedicated gigabit switched network. Periodically we have a problem where it will stop serving up files. Running 'ps' on the server shows a number of processes stuck in the 'D' state -- "a process in disk (or other short term, uninter-ruptible) wait". Usually this includes all the nfsd processes as well as any others that are trying to access the same disk drive. Any commands issued like 'du', 'sync', etc. go into the same state and never exit. It is impossible to kill any of these processes. We can pretty much force this to happen by running a large 'find' or something similar on the exported file system, although it will happen itself eventually without any such commands being run. Our only option (as far as we can tell) is to reboot the server, which results in a very long fsck period (it's over a terrabyte of disk space). This doesn't seem to be a hardware issue. This is a brand new server in all respects (all new hardware, new RAID) and we saw the exact same issue on the machine that it replaced (which was running 4.11 on i386). Any thoughts on this? Any more info I should provide? Mark -- Mark Morley Owner / Administrator Islandnet.com