From owner-freebsd-questions@FreeBSD.ORG Sat Aug 7 05:30:51 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B540916A4CE for ; Sat, 7 Aug 2004 05:30:51 +0000 (GMT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3A4D043D4C for ; Sat, 7 Aug 2004 05:30:51 +0000 (GMT) (envelope-from gemini@geminix.org) Message-ID: <41146907.90809@geminix.org> Date: Sat, 07 Aug 2004 07:30:47 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7) Gecko/20040629 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-questions@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1BtJn7-0009Te-00; Sat, 07 Aug 2004 07:30:49 +0200 Subject: Re: identifying and fixing server I/O slowdowns X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Aug 2004 05:30:51 -0000 Jeff Kramer wrote: > Oh great and wise FreeBSD gurus, > > I've been running FreeBSD boxes for about five years with great results > (up to 6 at the moment), but recently one of my machines has started to > seriously act up. Every time a heavy disk operation (say, tar'ing a 1 > gig directory) occurs the system slows to a crawl, and requests to > apache/php/mysql sites hosted on it just hang. > > The system is a dual p3 1.13ghz box with a gig of ram and mirrored 80 > gig WD800BB drives on a Promise TX2 controller. The raid isn't > degraded. There's a dedicated 1.5 gig swap partition and a swap file on > the /usr partition. We had some apache processes go nuts one time, > which is why I added the swap file. > [...] This problem could be due to a disk drive that is about to fail. If there are (still recoverable) disk errors, retrying the affected I/O operations can keep a disk controller occupied for serveral seconds. Of course, all processes trying to do disk I/O during this time span will block. Since the errors are (eventually) recoverable the raid array is likely to _not_ drop into degraded mode by itself. After you've found out which of the disks it is you would have to force that disk into failed mode and would then replace it. The exact details depend on your raid controller. Of course, your mileage may vary, but I've experienced disk failures like these several times in the past, with the effect you've described. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net