Date: Sat, 23 Oct 2004 00:46:23 -0400 From: Mikhail Teterin <mi+kde@aldan.algebra.com> To: current@FreeBSD.org Subject: 5.3-STABLE hangs under load (by bufdaemon) Message-ID: <200410230046.24235@aldan>
next in thread | raw e-mail | index | archive | help
5.3-STABLE amd64. Under heavy load -- database dumps over NFS to local
disk -- there comes a point, when the system process `bufdaemon' starts
taking almost entire CPU.
The machine stops doing anything else, but the earlier started `systat'
continues to work. After two hours of this, even that stops and the
systat's display remains frozen at:
------------------------------------------------------------------------
4 users Load 3.07 2.90 2.86 Oct 23 00:05
Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 29220 8624 66660 13100 1356976 count
All 657940 11900 1333164 18692 pages
zfod Interrupts
Proc:r p d s w Csw Trp Sys Int Sof Flt cow 1490 total
5 41 1628 4 162 1857 9 248892 wire 1: atkb
21364 act 1026 0: clk
93.9%Sys 0.8%Intr 0.0%User 0.0%Nice 5.3%Idl 390852 inact 6: fdc0
| | | | | | | | | | 8 cache 128 8: rtc
=============================================== 1356968 free 160 9: acpi
daefr 14: ata
Namei Name-cache Dir-cache prcfr 15: ata
Calls hits % hits % react 8 16: ahc
pdwak 160 17: pcm
pdpgs 8 24: bge
Disks afd0 ad6 amrd0 sa0 pass0 intrn 26: amr
KB/t 0.00 16.00 0.00 0.00 0.00 218832 buf
tps 0 161 0 0 0 3106 dirtybuf
MB/s 0.00 2.51 0.00 0.00 0.00 100000 desiredvnodes
% busy 0 7 0 0 0 807 numvnodes
Showing vmstat, refresh every 1 seconds. 247
------------------------------------------------------------------------
The ad6 is the disk in question. What is it doing at 2.51Mb/s for two
hours remains a mistery -- as far as the NFS-client can tell, the server
stopped responding long ago.
Any advice on tuning this? The machine has 2Gb of RAM and runs on a
single Opteron. Shortly before going into this coma, the system reports
write-errors with the ad6:
Oct 22 21:31:24 pandora kernel: ad6: FAILURE - WRITE_DMA
status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=370211679
Oct 22 21:31:32 pandora kernel: ad6: TIMEOUT - WRITE_DMA retrying (2 retries
left) LBA=373975135
but why would a device's trouble cause bufdaemon to to freak out?
-mi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200410230046.24235>
