Date: Sat, 1 Jan 2000 12:10:55 -0800 (PST) From: dillon@backplane.com To: FreeBSD-gnats-submit@freebsd.org Subject: kern/15825: Softupdates gets behind, runs the system out of KVM Message-ID: <200001012010.MAA92570@apollo.backplane.com>
next in thread | raw e-mail | index | archive | help
>Number: 15825
>Category: kern
>Synopsis: Softupdates gets behind, runs the system out of KVM
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sat Jan 1 12:20:00 PST 2000
>Closed-Date:
>Last-Modified:
>Originator: Matthew Dillon
>Release: FreeBSD 3.4-STABLE i386
>Organization:
Backplane Inc.
>Environment:
FreeBSD 3.4, UP configuration, 512MB ram, fast (40MB/sec) adaptec
SCSI subsystem, fast disks (seacreate 18Gers).
>Description:
In tests with postmark and also noted with programs like postfix (a mail
backend), softupdates can get bogged down with directory add/rem
dependancies that cause the number of dependancies in several softupdates
categories to increase continuously until the system runs out of KVM.
A second bug found: When ^Zing a postmark process I found that it
stopped right in the middle of a softupdates sleep while softupdates
was holding at least one lock. The other three postmark processes
COULD NOT BE INTERRUPTED while the first one was in a stopped state.
Attempting to limit the number of dependancies with debug.max_softdeps
does not stop the problem and may in fact exasperate it.
--
Ho ho! Overnight the KVM useage in my postmark test on my FreeBSD-3.x
test box jumped from 8MB to 51MB.
My 4.x test box has remained stable at 18MB - no jump in KVM useage.
Now I am having a hellofatime trying to stop the four postmark processes.
If I stop one it prevents the others from being stopped. softupdates
seems to allow processes to be stopped while holding locks! That's a
bug, but not the one causing the KVM useage.
st4# ps axl | fgrep post
0 684 335 0 18 0 6472 5976 softup T p1 71:26.51 postmark
0 863 335 0 -2 0 6472 6076 getblk D p1 125:05.16 postmark
0 864 335 0 -2 0 6472 6076 getblk D p1 127:46.63 postmark
0 866 335 0 18 0 6472 6076 - T p1 133:22.19 postmark
'sync' has no real effect, even after I kill the processes.
iostat shows that ccd0 is completely saturated.
test4# iostat ccd0 1
tty ccd0 cpu
tin tout KB/t tps MB/s us ni sy in id
0 21 0.00 0 0.00 1 0 30 1 68
0 43 4.82 127 0.60 1 0 14 0 85
0 43 6.91 347 2.34 0 1 12 1 86
0 43 6.92 212 1.43 0 0 0 1 99
0 42 5.80 122 0.69 0 1 22 0 77
0 43 6.04 130 0.76 0 0 26 0 74
Here is the first vmstat -m output:
NQNFS Lease 1 1K 1K 85696K 1 0 0 1K
NFS hash 1 128K 128K 85696K 1 0 0 128K
pagedep 64 20K 30K 85696K 150650 0 0 64,16K
inodedep 95866 12112K 12468K 85696K 12084672 0 0 128,128K
newblk 1 1K 1K 85696K 31740889 0 0 32,256
bmsafemap 73 3K 4K 85696K 1994953 0 0 32
allocdirect 325 21K 354K 85696K 31185275 0 0 64
indirdep 10 41K 129K 85696K 353964 0 0 32,8K
allocindir 12 1K 9K 85696K 555613 0 0 64
freefrag521374 16293K 16343K 85696K 19393498 0 0 32
freeblks 72564 9071K 9147K 85696K 2318532 0 0 128
freefile 72564 2268K 2287K 85696K 2318532 0 0 32
diradd 483 16K 193K 85696K 2528350 0 0 32
mkdir 0 0K 1K 85696K 8 0 0 32
dirrem 95262 2977K 2988K 85696K 2413794 0 0 32
FFS node 25743 6436K 6436K 85696K 3221821 0 0 256
MFS node 1 1K 1K 85696K 3 0 0 64,256
UFS ihash 1 128K 128K 85696K 1 0 0 128K
UFS mount 21 52K 52K 85696K 24 0 0 512,2K,4K,32K
ZONE 18 3K 3K 85696K 18 0 0 128
mbuf 1 4K 4K 85696K 1 0 0 4K
memdesc 1 4K 4K 85696K 1 0 0 4K
Memory Totals: In Use Free Requests
51035K 2398K 156200252
Here is the second vmstat -m output, a few minutes after I've killed
the four postmark processes. The softupdates dependancies are slowly
draining.
NQNFS Lease 1 1K 1K 85696K 1 0 0 1K
NFS hash 1 128K 128K 85696K 1 0 0 128K
pagedep 1 16K 30K 85696K 151028 0 0 64,16K
inodedep 79358 10048K 12468K 85696K 12120648 0 0 128,128K
newblk 1 1K 1K 85696K 31781645 0 0 32,256
bmsafemap 0 0K 4K 85696K 1999787 0 0 32
allocdirect 0 0K 354K 85696K 31225336 0 0 64
indirdep 0 0K 129K 85696K 354517 0 0 32,8K
allocindir 0 0K 9K 85696K 556308 0 0 64
freefrag423942 13249K 16365K 85696K 19416872 0 0 32
freeblks 75076 9385K 9460K 85696K 2338023 0 0 128
freefile 75076 2347K 2365K 85696K 2338023 0 0 32
diradd 0 0K 193K 85696K 2531655 0 0 32
mkdir 0 0K 1K 85696K 8 0 0 32
dirrem 79088 2472K 2988K 85696K 2417111 0 0 32
FFS node 25743 6436K 6436K 85696K 3340478 0 0 256
MFS node 1 1K 1K 85696K 3 0 0 64,256
UFS ihash 1 128K 128K 85696K 1 0 0 128K
UFS mount 21 52K 52K 85696K 24 0 0 512,2K,4K,32K
ZONE 18 3K 3K 85696K 18 0 0 128
mbuf 1 4K 4K 85696K 1 0 0 4K
memdesc 1 4K 4K 85696K 1 0 0 4K
Memory Totals: In Use Free Requests
46714K 6718K 156702268
The drain rate:
test4# while (1)
while? vmstat -m | tail -2
while? sleep 10
while? end
Memory Totals: In Use Free Requests (10 seconds per)
34127K 19334K 156997508
Memory Totals: In Use Free Requests
33262K 20199K 157014568
Memory Totals: In Use Free Requests
32303K 21157K 157029536
Memory Totals: In Use Free Requests
31287K 22174K 157045809
Memory Totals: In Use Free Requests
30471K 22989K 157063038
Memory Totals: In Use Free Requests
29270K 24191K 157079301
Memory Totals: In Use Free Requests
28361K 25100K 157099823
Memory Totals: In Use Free Requests
27123K 26338K 157117218
Memory Totals: In Use Free Requests
25984K 27520K 157132238
Memory Totals: In Use Free Requests
25760K 27913K 157151309
Memory Totals: In Use Free Requests
25463K 28322K 157182362
...
It's obvious to me what is going on. First we have a serious bug
somewhere in the softupdates code that is allowing signal-stop to
occur while softupdates is waiting for a lock. But what is really
causing all hell to break loose is a combination of sotupdates building
up a huge set of interrelated dependancies that eats a *lot* of disk
bandwidth to unwind (due to seeking back and forth), and FreeBSD-3.x not
flushing the 'right' buffers.
I'm not sure what 4.x is doing that is making it less susceptible to the
softupdates problem. It's quite obvious to me that 3.x is flushing
its buffers non-optimally (well, we knew that already, that's one reason
why getnewbuf() was rewritten and buf_daemon added!) but it's hard to say
what 'optimal' should be since neither 3.x's nor 4.x's buffer cache are
softupdates-aware (they can't tell whether a buffer will be redirted
or not when they flush it).
Kirk relies on the update daemon to flush vnodes out in the correct
order but this tends to break down badly in a heavily loaded system.
What we are left with is an non-optimal flush coupled with a huge set of
interrelated dependancies.
I also recall that the file-remove case is a complex special case with
softupdates. Considering the number of 'dirrem' softupdates elements
allocated I am guessing that this is the core of the problem.
A vmstat -m on my 4.x test box, running the same postmark test for the
same amount of time (about 24 hours) shows:
inodedep 798 356K 2638K102400K 11294336 0 0 128,256K
newblk 1 1K 1K102400K 10117649 0 0 32,256
bmsafemap 22 1K 12K102400K 3241677 0 0 32
allocdirect 435 28K 389K102400K 10117240 0 0 64
indirdep 0 0K 65K102400K 64 0 0 32,8K,32K
allocindir 0 0K 10K102400K 408 0 0 64
freefrag 270 9K 73K102400K 2706257 0 0 32
freeblks 156 20K 1673K102400K 4255762 0 0 128
freefile 156 5K 419K102400K 4255793 0 0 32
diradd 219 7K 582K102400K 4342945 0 0 32
mkdir 0 0K 1K102400K 12 0 0 32
dirrem 96 3K 430K102400K 4255939 0 0 32
FFS node 47959 11990K 12415K102400K 4728732 0 0 256
UFS ihash 1 256K 256K102400K 1 0 0 256K
UFS mount 18 49K 49K102400K 18 0 0 512,2K,4K,32K
VM pgdata 1 256K 256K102400K 1 0 0 256K
ZONE 18 3K 3K102400K 18 0 0 128
isadev 12 1K 1K102400K 12 0 0 64
ATA generic 3 1K 1K102400K 3 0 0 128
ATAPI generic 2 1K 1K102400K 3 0 0 32,128,256
ACD driver 3 2K 2K102400K 3 0 0 16,256,1K
devbuf 749 407K 1654K102400K 22387016 0 0 16,32,64,128,256,512,1K,2K,4K,8K,16K,32K
mbuf 1 4K 4K102400K 1 0 0 4K
memdesc 1 4K 4K102400K 1 0 0 4K
isa_devlist 19 3K 3K102400K 19 0 0 16,512,2K
atkbddev 2 1K 1K102400K 2 0 0 16
Memory Totals: In Use Free Requests
18292K 7244K 97850590
The worst case KVM useage didn't blow up like it did on the 3.x box,
though it is still using a considerable amount of meemory - 18+7 = 25MB
at peak. But when I observe it in real time it is clear to me that
although directory file removal dependancies build up, they appear to
drain quickly enough to not post a problem. For example, I see 'dirrem'
useage jump around between 0 and 200. I see 'diradd' useage build up
to around 450 and then stabilize and finally drop down again.
>How-To-Repeat:
Create a large (18G or larger) partition.
(cd /usr/ports/benchmarks/postmark; make; make install)
rehash
mkdir test1
mkdir test2
mkdir test3
mkdir test4
(cd /partition/test1; postmark) (run four in parallel)
(cd /partition/test2; postmark)
(cd /partition/test3; postmark)
(cd /partition/test4; postmark)
Use the following parameters for each postmark:
set number 30000
set transactions 4000000
set size 1500 200000
run
>Fix:
None as yet.
>Release-Note:
>Audit-Trail:
>Unformatted:
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200001012010.MAA92570>
