From owner-freebsd-current@FreeBSD.ORG Sat Oct 23 04:46:26 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 65B3316A4CE for ; Sat, 23 Oct 2004 04:46:26 +0000 (GMT) Received: from aldan.algebra.com (aldan.algebra.com [216.254.65.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id D182043D39 for ; Sat, 23 Oct 2004 04:46:25 +0000 (GMT) (envelope-from mi+kde@aldan.algebra.com) Received: from aldan.algebra.com (mi@localhost [127.0.0.1]) by aldan.algebra.com (8.13.1/8.13.1) with ESMTP id i9N4kPj7007878 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 23 Oct 2004 00:46:25 -0400 (EDT) (envelope-from mi+kde@aldan.algebra.com) Received: from localhost (localhost [[UNIX: localhost]]) by aldan.algebra.com (8.13.1/8.13.1/Submit) id i9N4kO52007877 for current@FreeBSD.org; Sat, 23 Oct 2004 00:46:24 -0400 (EDT) (envelope-from mi+kde@aldan.algebra.com) From: Mikhail Teterin To: current@FreeBSD.org Date: Sat, 23 Oct 2004 00:46:23 -0400 User-Agent: KMail/1.7 X-Face: %UW#n0|w>ydeGt/b@1-.UFP=K^~-:0f#O:D7whJ5G_<5143Bb3kOIs9XpX+"V+~$adGP:J|SLieM31VIhqXeLBli" X-Mailman-Approved-At: Sat, 23 Oct 2004 13:28:49 +0000 Subject: 5.3-STABLE hangs under load (by bufdaemon) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Oct 2004 04:46:26 -0000 5.3-STABLE amd64. Under heavy load -- database dumps over NFS to local disk -- there comes a point, when the system process `bufdaemon' starts taking almost entire CPU. The machine stops doing anything else, but the earlier started `systat' continues to work. After two hours of this, even that stops and the systat's display remains frozen at: ------------------------------------------------------------------------ 4 users Load 3.07 2.90 2.86 Oct 23 00:05 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 29220 8624 66660 13100 1356976 count All 657940 11900 1333164 18692 pages zfod Interrupts Proc:r p d s w Csw Trp Sys Int Sof Flt cow 1490 total 5 41 1628 4 162 1857 9 248892 wire 1: atkb 21364 act 1026 0: clk 93.9%Sys 0.8%Intr 0.0%User 0.0%Nice 5.3%Idl 390852 inact 6: fdc0 | | | | | | | | | | 8 cache 128 8: rtc =============================================== 1356968 free 160 9: acpi daefr 14: ata Namei Name-cache Dir-cache prcfr 15: ata Calls hits % hits % react 8 16: ahc pdwak 160 17: pcm pdpgs 8 24: bge Disks afd0 ad6 amrd0 sa0 pass0 intrn 26: amr KB/t 0.00 16.00 0.00 0.00 0.00 218832 buf tps 0 161 0 0 0 3106 dirtybuf MB/s 0.00 2.51 0.00 0.00 0.00 100000 desiredvnodes % busy 0 7 0 0 0 807 numvnodes Showing vmstat, refresh every 1 seconds. 247 ------------------------------------------------------------------------ The ad6 is the disk in question. What is it doing at 2.51Mb/s for two hours remains a mistery -- as far as the NFS-client can tell, the server stopped responding long ago. Any advice on tuning this? The machine has 2Gb of RAM and runs on a single Opteron. Shortly before going into this coma, the system reports write-errors with the ad6: Oct 22 21:31:24 pandora kernel: ad6: FAILURE - WRITE_DMA status=51 error=4 LBA=370211679 Oct 22 21:31:32 pandora kernel: ad6: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=373975135 but why would a device's trouble cause bufdaemon to to freak out? -mi