From owner-freebsd-bugs  Sat Jan  1 12:20: 7 2000
Delivered-To: freebsd-bugs@freebsd.org
Received: from freefall.freebsd.org (freefall.FreeBSD.ORG [204.216.27.21])
	by hub.freebsd.org (Postfix) with ESMTP id 5543914F41
	for <freebsd-bugs@FreeBSD.org>; Sat,  1 Jan 2000 12:20:02 -0800 (PST)
	(envelope-from gnats@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.9.3/8.9.2) id MAA19430;
	Sat, 1 Jan 2000 12:20:02 -0800 (PST)
	(envelope-from gnats@FreeBSD.org)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 55FC215023
	for <FreeBSD-gnats-submit@freebsd.org>; Sat,  1 Jan 2000 12:10:56 -0800 (PST)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id MAA92570;
	Sat, 1 Jan 2000 12:10:55 -0800 (PST)
	(envelope-from dillon)
Message-Id: <200001012010.MAA92570@apollo.backplane.com>
Date: Sat, 1 Jan 2000 12:10:55 -0800 (PST)
From: dillon@backplane.com
Reply-To: dillon@backplane.com
To: FreeBSD-gnats-submit@freebsd.org
X-Send-Pr-Version: 3.2
Subject: kern/15825: Softupdates gets behind, runs the system out of KVM
Sender: owner-freebsd-bugs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


>Number:         15825
>Category:       kern
>Synopsis:       Softupdates gets behind, runs the system out of KVM
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jan  1 12:20:00 PST 2000
>Closed-Date:
>Last-Modified:
>Originator:     Matthew Dillon
>Release:        FreeBSD 3.4-STABLE i386
>Organization:
Backplane Inc.
>Environment:

    FreeBSD 3.4, UP configuration, 512MB ram, fast (40MB/sec) adaptec
    SCSI subsystem, fast disks (seacreate 18Gers).

>Description:

    In tests with postmark and also noted with programs like postfix (a mail
    backend), softupdates can get bogged down with directory add/rem 
    dependancies that cause the number of dependancies in several softupdates
    categories to increase continuously until the system runs out of KVM.

    A second bug found:  When ^Zing a postmark process I found that it
    stopped right in the middle of a softupdates sleep while softupdates
    was holding at least one lock.  The other three postmark processes 
    COULD NOT BE INTERRUPTED while the first one was in a stopped state.

    Attempting to limit the number of dependancies with debug.max_softdeps
    does not stop the problem and may in fact exasperate it.

    --

    Ho ho!  Overnight the KVM useage in my postmark test on my FreeBSD-3.x
    test box jumped from 8MB to 51MB. 

    My 4.x test box has remained stable at 18MB - no jump in KVM useage.

    Now I am having a hellofatime trying to stop the four postmark processes.
    If I stop one it prevents the others from being stopped.  softupdates 
    seems to allow processes to be stopped while holding locks!  That's a
    bug, but not the one causing the KVM useage.

st4# ps axl | fgrep post
    0   684   335   0  18  0  6472 5976 softup T     p1   71:26.51 postmark
    0   863   335   0  -2  0  6472 6076 getblk D     p1  125:05.16 postmark
    0   864   335   0  -2  0  6472 6076 getblk D     p1  127:46.63 postmark
    0   866   335   0  18  0  6472 6076 -      T     p1  133:22.19 postmark

    'sync' has no real effect, even after I kill the processes.

    iostat shows that ccd0 is completely saturated.

test4# iostat ccd0 1
      tty            ccd0             cpu
 tin tout  KB/t tps  MB/s  us ni sy in id
   0   21  0.00   0  0.00   1  0 30  1 68
   0   43  4.82 127  0.60   1  0 14  0 85
   0   43  6.91 347  2.34   0  1 12  1 86
   0   43  6.92 212  1.43   0  0  0  1 99
   0   42  5.80 122  0.69   0  1 22  0 77
   0   43  6.04 130  0.76   0  0 26  0 74

    Here is the first vmstat -m output:

  NQNFS Lease     1     1K      1K 85696K        1    0     0  1K
     NFS hash     1   128K    128K 85696K        1    0     0  128K
      pagedep    64    20K     30K 85696K   150650    0     0  64,16K
     inodedep 95866 12112K  12468K 85696K 12084672    0     0  128,128K
       newblk     1     1K      1K 85696K 31740889    0     0  32,256
    bmsafemap    73     3K      4K 85696K  1994953    0     0  32
  allocdirect   325    21K    354K 85696K 31185275    0     0  64
     indirdep    10    41K    129K 85696K   353964    0     0  32,8K
   allocindir    12     1K      9K 85696K   555613    0     0  64
     freefrag521374 16293K  16343K 85696K 19393498    0     0  32
     freeblks 72564  9071K   9147K 85696K  2318532    0     0  128
     freefile 72564  2268K   2287K 85696K  2318532    0     0  32
       diradd   483    16K    193K 85696K  2528350    0     0  32
        mkdir     0     0K      1K 85696K        8    0     0  32
       dirrem 95262  2977K   2988K 85696K  2413794    0     0  32
     FFS node 25743  6436K   6436K 85696K  3221821    0     0  256
     MFS node     1     1K      1K 85696K        3    0     0  64,256
    UFS ihash     1   128K    128K 85696K        1    0     0  128K
    UFS mount    21    52K     52K 85696K       24    0     0  512,2K,4K,32K
         ZONE    18     3K      3K 85696K       18    0     0  128
         mbuf     1     4K      4K 85696K        1    0     0  4K
      memdesc     1     4K      4K 85696K        1    0     0  4K

Memory Totals:  In Use    Free    Requests
                51035K   2398K    156200252


    Here is the second vmstat -m output, a few minutes after I've killed
    the four postmark processes.  The softupdates dependancies are slowly
    draining.

  NQNFS Lease     1     1K      1K 85696K        1    0     0  1K
     NFS hash     1   128K    128K 85696K        1    0     0  128K
      pagedep     1    16K     30K 85696K   151028    0     0  64,16K
     inodedep 79358 10048K  12468K 85696K 12120648    0     0  128,128K
       newblk     1     1K      1K 85696K 31781645    0     0  32,256
    bmsafemap     0     0K      4K 85696K  1999787    0     0  32
  allocdirect     0     0K    354K 85696K 31225336    0     0  64
     indirdep     0     0K    129K 85696K   354517    0     0  32,8K
   allocindir     0     0K      9K 85696K   556308    0     0  64
     freefrag423942 13249K  16365K 85696K 19416872    0     0  32
     freeblks 75076  9385K   9460K 85696K  2338023    0     0  128
     freefile 75076  2347K   2365K 85696K  2338023    0     0  32
       diradd     0     0K    193K 85696K  2531655    0     0  32
        mkdir     0     0K      1K 85696K        8    0     0  32
       dirrem 79088  2472K   2988K 85696K  2417111    0     0  32
     FFS node 25743  6436K   6436K 85696K  3340478    0     0  256
     MFS node     1     1K      1K 85696K        3    0     0  64,256
    UFS ihash     1   128K    128K 85696K        1    0     0  128K
    UFS mount    21    52K     52K 85696K       24    0     0  512,2K,4K,32K
         ZONE    18     3K      3K 85696K       18    0     0  128
         mbuf     1     4K      4K 85696K        1    0     0  4K
      memdesc     1     4K      4K 85696K        1    0     0  4K

Memory Totals:  In Use    Free    Requests
                46714K   6718K    156702268


    The drain rate:

test4# while (1)
while? vmstat -m | tail -2
while? sleep 10
while? end
Memory Totals:  In Use    Free    Requests		(10 seconds per)
                34127K  19334K    156997508
Memory Totals:  In Use    Free    Requests
                33262K  20199K    157014568
Memory Totals:  In Use    Free    Requests
                32303K  21157K    157029536
Memory Totals:  In Use    Free    Requests
                31287K  22174K    157045809
Memory Totals:  In Use    Free    Requests
                30471K  22989K    157063038
Memory Totals:  In Use    Free    Requests
                29270K  24191K    157079301
Memory Totals:  In Use    Free    Requests
                28361K  25100K    157099823
Memory Totals:  In Use    Free    Requests
                27123K  26338K    157117218
Memory Totals:  In Use    Free    Requests
                25984K  27520K    157132238
Memory Totals:  In Use    Free    Requests
                25760K  27913K    157151309
Memory Totals:  In Use    Free    Requests
                25463K  28322K    157182362
	...


    It's obvious to me what is going on.  First we have a serious bug 
    somewhere in the softupdates code that is allowing signal-stop to
    occur while softupdates is waiting for a lock.  But what is really
    causing all hell to break loose is a combination of sotupdates building 
    up a huge set of interrelated dependancies that eats a *lot* of disk
    bandwidth to unwind (due to seeking back and forth), and FreeBSD-3.x not
    flushing the 'right' buffers.

    I'm not sure what 4.x is doing that is making it less susceptible to the
    softupdates problem.  It's quite obvious to me that 3.x is flushing 
    its buffers non-optimally (well, we knew that already, that's one reason
    why getnewbuf() was rewritten and buf_daemon added!) but it's hard to say 
    what 'optimal' should be since neither 3.x's nor 4.x's buffer cache are
    softupdates-aware (they can't tell whether a buffer will be redirted
    or not when they flush it).

    Kirk relies on the update daemon to flush vnodes out in the correct 
    order but this tends to break down badly in a heavily loaded system.
    What we are left with is an non-optimal flush coupled with a huge set of
    interrelated dependancies.

    I also recall that the file-remove case is a complex special case with
    softupdates.  Considering the number of 'dirrem' softupdates elements
    allocated I am guessing that this is the core of the problem.

    A vmstat -m on my 4.x test box, running the same postmark test for the 
    same amount of time (about 24 hours) shows:

     inodedep   798   356K   2638K102400K 11294336    0     0  128,256K
       newblk     1     1K      1K102400K 10117649    0     0  32,256
    bmsafemap    22     1K     12K102400K  3241677    0     0  32
  allocdirect   435    28K    389K102400K 10117240    0     0  64
     indirdep     0     0K     65K102400K       64    0     0  32,8K,32K
   allocindir     0     0K     10K102400K      408    0     0  64
     freefrag   270     9K     73K102400K  2706257    0     0  32
     freeblks   156    20K   1673K102400K  4255762    0     0  128
     freefile   156     5K    419K102400K  4255793    0     0  32
       diradd   219     7K    582K102400K  4342945    0     0  32
        mkdir     0     0K      1K102400K       12    0     0  32
       dirrem    96     3K    430K102400K  4255939    0     0  32
     FFS node 47959 11990K  12415K102400K  4728732    0     0  256
    UFS ihash     1   256K    256K102400K        1    0     0  256K
    UFS mount    18    49K     49K102400K       18    0     0  512,2K,4K,32K
    VM pgdata     1   256K    256K102400K        1    0     0  256K
         ZONE    18     3K      3K102400K       18    0     0  128
       isadev    12     1K      1K102400K       12    0     0  64
  ATA generic     3     1K      1K102400K        3    0     0  128
ATAPI generic     2     1K      1K102400K        3    0     0  32,128,256
   ACD driver     3     2K      2K102400K        3    0     0  16,256,1K
       devbuf   749   407K   1654K102400K 22387016    0     0  16,32,64,128,256,512,1K,2K,4K,8K,16K,32K
         mbuf     1     4K      4K102400K        1    0     0  4K
      memdesc     1     4K      4K102400K        1    0     0  4K
  isa_devlist    19     3K      3K102400K       19    0     0  16,512,2K
     atkbddev     2     1K      1K102400K        2    0     0  16

Memory Totals:  In Use    Free    Requests
                18292K   7244K    97850590

    The worst case KVM useage didn't blow up like it did on the 3.x box, 
    though it is still using a considerable amount of meemory - 18+7 = 25MB
    at peak.  But when I observe it in real time it is clear to me that 
    although directory file removal dependancies build up, they appear to
    drain quickly enough to not post a problem.  For example, I see 'dirrem'
    useage jump around between 0 and 200.  I see 'diradd' useage build up
    to around 450 and then stabilize and finally drop down again.

>How-To-Repeat:

    Create a large (18G or larger) partition.  

    (cd /usr/ports/benchmarks/postmark; make; make install)
    rehash

    mkdir test1
    mkdir test2
    mkdir test3
    mkdir test4

    (cd /partition/test1; postmark)		(run four in parallel)
    (cd /partition/test2; postmark)
    (cd /partition/test3; postmark)
    (cd /partition/test4; postmark)

    Use the following parameters for each postmark:

	set number 30000
	set transactions 4000000
	set size 1500 200000
	run

>Fix:

    None as yet.


>Release-Note:
>Audit-Trail:
>Unformatted:


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message