From owner-freebsd-fs@FreeBSD.ORG Tue Mar 14 03:17:40 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8BDBC16A41F for ; Tue, 14 Mar 2006 03:17:40 +0000 (UTC) (envelope-from mark@islandnet.com) Received: from cluster.islandnet.com (cluster.islandnet.com [199.175.106.56]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5380343D45 for ; Tue, 14 Mar 2006 03:17:40 +0000 (GMT) (envelope-from mark@islandnet.com) Received: from [199.175.106.221] (port=24169 helo=helpdesk.islandnet.com) by cluster07.islandnet.com with SMTP id 1FJ02V-000LZ9-Pn for freebsd-fs@freebsd.org; Mon, 13 Mar 2006 19:17:40 -0800 Date: Mon, 13 Mar 2006 19:17:40 -0800 Message-ID: <441635d4-1329f@helpdesk.islandnet.com> From: Mark Morley To: freebsd-fs@freebsd.org Content-type: text/plain MIME-Version: 1.0 X-Priority: 3 X-Mailer: Helpdesk Webmail (http://helpdesk.islandnet.com) X-Originating-IP: [24.108.70.188] X-GeoIP: CA Canada Subject: Processes stuck in 'disk wait' state X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Mark Morley List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Mar 2006 03:17:40 -0000 Got what seems to be an NFS issue on two different servers where processes seem to be deadlocked, stuck in a 'disk wait' state. The first server was running fine for a couple years with 4.x, but then a month or so ago I noticed a bunch of cron-spawned processes stuck in disk wait ('find' commands, for example). Any command I tried that accessed the NFS drives (sync, du, ls, etc) would immediately lock up in the same state. These processes could not be killed without rebooting. Thinking it might be a bad disk, and since we were planning an upgrade anyway, we built a new server. New motherboard/CPU, new boot drive, replaced the data drives with a new RAID system, etc. The new system is an AMD/64, the old one was AMD/i386. The new one is running 6.1 Ran great for about 3 weeks, then the same thing happened. It has nothing in common with the old server as far as hardware goes. Both servers had the same set of clients, which are all FreeBSD 4.11 using TCP. These haven't changed, except they are getting busier al the time. The NFS traffic is on a dedicated switched gigabit network, nice and fast. The problem seems to occur when a file system intensive task is run right on the NFS server itself during heavy NFS usage. Things like using 'find' to delete old files, or using 'du' on a large set of directories, etc. Does this ring any bells for anyone? Are there any known issues with the NFS code that would cause this? Any known solutions? Mark -- Mark Morley Owner / Administrator Islandnet.com