From owner-freebsd-current@FreeBSD.ORG  Sat Oct 23 04:46:26 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 65B3316A4CE
	for <current@FreeBSD.org>; Sat, 23 Oct 2004 04:46:26 +0000 (GMT)
Received: from aldan.algebra.com (aldan.algebra.com [216.254.65.224])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D182043D39
	for <current@FreeBSD.org>; Sat, 23 Oct 2004 04:46:25 +0000 (GMT)
	(envelope-from mi+kde@aldan.algebra.com)
Received: from aldan.algebra.com (mi@localhost [127.0.0.1])
	by aldan.algebra.com (8.13.1/8.13.1) with ESMTP id i9N4kPj7007878
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <current@FreeBSD.org>; Sat, 23 Oct 2004 00:46:25 -0400 (EDT)
	(envelope-from mi+kde@aldan.algebra.com)
Received: from localhost (localhost [[UNIX: localhost]])
	by aldan.algebra.com (8.13.1/8.13.1/Submit) id i9N4kO52007877
	for current@FreeBSD.org; Sat, 23 Oct 2004 00:46:24 -0400 (EDT)
	(envelope-from mi+kde@aldan.algebra.com)
From: Mikhail Teterin <mi+kde@aldan.algebra.com>
To: current@FreeBSD.org
Date: Sat, 23 Oct 2004 00:46:23 -0400
User-Agent: KMail/1.7
X-Face: %UW#n0|w>ydeGt/b@1-.UFP=K^~-:0f#O:D7w<gv/&E-lL7twZCT8B~/PA4|\t$ti+22K">hJ5G_<5143Bb3kOIs9XpX+"V+~$adGP:J|SLieM31VIhqXeLBli"<kcG^EOVihy+z3/UR{6SCQ
MIME-Version: 1.0
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200410230046.24235@aldan>
X-Mailman-Approved-At: Sat, 23 Oct 2004 13:28:49 +0000
Subject: 5.3-STABLE hangs under load (by bufdaemon)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Oct 2004 04:46:26 -0000

5.3-STABLE amd64. Under heavy load -- database dumps over NFS to local
disk -- there comes a point, when the system process `bufdaemon' starts
taking almost entire CPU.

The machine stops doing anything else, but the earlier started `systat'
continues to work. After two hours of this, even that stops and the
systat's display remains frozen at:

------------------------------------------------------------------------
    4 users    Load  3.07  2.90  2.86                  Oct 23 00:05
Mem:KB    REAL            VIRTUAL                     VN PAGER  SWAP PAGER
        Tot   Share      Tot    Share    Free         in  out     in out
Act   29220    8624    66660    13100 1356976 count
All  657940   11900  1333164    18692         pages
                                                          zfod Interrupts
Proc:r  p  d  s  w    Csw  Trp  Sys  Int  Sof  Flt        cow    1490 total
           5 41      1628    4  162 1857    9      248892 wire        1: atkb
                                                    21364 act    1026 0: clk
93.9%Sys   0.8%Intr  0.0%User  0.0%Nice  5.3%Idl   390852 inact       6: fdc0
|    |    |    |    |    |    |    |    |    |          8 cache   128 8: rtc
===============================================   1356968 free    160 9: acpi
                                                          daefr 14: ata
Namei         Name-cache    Dir-cache                     prcfr 15: ata
    Calls     hits    %     hits    %                     react     8 16: ahc
                                                          pdwak   160 17: pcm
                                                          pdpgs     8 24: bge
Disks  afd0   ad6 amrd0   sa0 pass0                       intrn 26: amr
KB/t   0.00 16.00  0.00  0.00  0.00                218832 buf
tps       0   161     0     0     0                  3106 dirtybuf
MB/s   0.00  2.51  0.00  0.00  0.00                100000 desiredvnodes
% busy    0     7     0     0     0                   807 numvnodes
Showing vmstat, refresh every 1 seconds.              247
------------------------------------------------------------------------

The ad6 is the disk in question. What is it doing at 2.51Mb/s for two
hours remains a mistery -- as far as the NFS-client can tell, the server
stopped responding long ago.

Any advice on tuning this? The machine has 2Gb of RAM and runs on a
single Opteron. Shortly before going into this coma, the system reports
write-errors with the ad6:

Oct 22 21:31:24 pandora kernel: ad6: FAILURE - WRITE_DMA 
status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=370211679
Oct 22 21:31:32 pandora kernel: ad6: TIMEOUT - WRITE_DMA retrying (2 retries 
left) LBA=373975135

but why would a device's trouble cause bufdaemon to to freak out?


 -mi