From owner-freebsd-fs@FreeBSD.ORG Tue Apr 11 15:12:36 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E16C416A400 for ; Tue, 11 Apr 2006 15:12:35 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7E69C43D45 for ; Tue, 11 Apr 2006 15:12:35 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id k3BFCYMU084847; Tue, 11 Apr 2006 10:12:34 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <443BC755.1080905@centtech.com> Date: Tue, 11 Apr 2006 10:12:21 -0500 From: Eric Anderson User-Agent: Thunderbird 1.5 (X11/20060402) MIME-Version: 1.0 To: Nicolas KOWALSKI References: <20060329152608.GB1375@deviant.kiev.zoral.com.ua> <20060410144904.GC1408@deviant.kiev.zoral.com.ua> <443A7C8E.4020203@centtech.com> <443A8842.6060802@centtech.com> <443A97F9.8090601@centtech.com> <443B8EC1.8080004@centtech.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1391/Tue Apr 11 04:53:41 2006 on mh2.centtech.com X-Virus-Status: Clean Cc: freebsd-fs@freebsd.org Subject: Re: [patch] giant-less quotas for UFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Apr 2006 15:12:36 -0000 Nicolas KOWALSKI wrote: > Eric Anderson writes: > >> Nicolas KOWALSKI wrote: >>> Eric Anderson writes: >>> >>>> Nicolas KOWALSKI wrote: >>>>> Eric Anderson writes: >>>>> >>>>>> Nicolas KOWALSKI wrote: >>>>>>> Yes, this is exactly what is happening. To add some precision, some >>>>>>> students here use calculation applications >>>>>>> that allocate a lot of disk space, ususally more than their allowed >>>>>>> home quotas; when by error they launch these apps in their home >>>>>>> directories, instead of their workstation dedicated space, it makes >>>>>>> the server go to its knees on the NFS client side. >>>>>> When you say 'to it's knees' - what do you mean exactly? How many >>>>>> clients do you have, how much memory is on the server, and how many >>>>>> nfsd threads are you using? What kind of load average do you see >>>>>> during this (on the server)? >>>>> Sorry for the imprecision. >>>>> The server is a Dual-Xeon 2.8Ghz, 2GB of RAM, using SCSI3 Ultra320 >>>>> 76GB disks and controller. It is accessed by NFS from ~100 Unix >>>>> (Linux, Solaris) clients, and by Samba from ~15 Windows XP. The >>>>> network connection is GB ethernet. >>>>> During slowdowns, it's only from a NFS client view that the server >>>>> does not respond. For example, a simple 'ls' in my home directory is >>>>> almost immediate, but when it slows down, it can take up to 2 minutes. >>>>> On the server, the load average goes to 0.5, compared to a default >>>>> maximum of 0.15-0.20. The nfsd processus shows them in the state >>>>> "biowr" in top, but nothing is really written, because the quotas >>>>> system block any further writes to the user exceeding her/his quotas. >>>>> >>>> In this case (which is what I suspected), try bumping up your nfsd >>>> threads to 128. I set mine very high (I have around 1000 clients), >>>> and I can say there aren't really ill-effects besides a bit of memory >>>> usage (which you have plenty of). I suspect increasing the threads >>>> will neutralize this problem for you. >>> Using 128 nfsd threads, I stressed the server, by running on a NFS >>> client a small C program, writting continuously in a file, so that the >>> user "biguser" (account stored on /export/home2) exceeds his quota. >>> It half-works: during the test, users working on another disk >>> (/export/home) did not see any difference, but users working on the >>> same disk that "biguser" (/export/home2) where almost halted. >>> So, this is better, because before everybody was halted, but there is >>> still a problem. >>> Any other tips ? >> Watch gstat during the testing, and see if the disk that holds the >> full partition is really busy. I'm betting it's thrashing the disk >> continually checking for free space. I don't think there's any way to >> avoid that. > > Mh, I did not find this "gstat" tool on the system or in the ports; > perhaps is it in >= 5.x ? (the server is running 4.11-p15). > > It is sad I can not do anything about it: such a server pulled down by > a single NFS-client. :-( *sigh* - I should really pay more attention to the beginning of the thread. I thought you were on 5.x, so my mistake. You'll need to use iostat to see what's busy with your disks. I strongly recommend going to 6.x if you can.. Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------