From owner-freebsd-bugs  Thu Jul 13 23:21:24 1995
Return-Path: bugs-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.10/8.6.6) id XAA18558
          for bugs-outgoing; Thu, 13 Jul 1995 23:21:24 -0700
Received: from blob.best.net (blob.best.net [204.156.128.88])
          by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id XAA18552
          for <bugs@freebsd.org>; Thu, 13 Jul 1995 23:21:23 -0700
Received: (dillon@localhost) by blob.best.net (8.6.12/8.6.5) id XAA27589; Thu, 13 Jul 1995 23:21:21 -0700
Date: Thu, 13 Jul 1995 23:21:21 -0700
From: Matt Dillon <dillon@blob.best.net>
Message-Id: <199507140621.XAA27589@blob.best.net>
To: bugs@freebsd.org
Subject: Another clue. (ufslk2/newbuf deadlock crash)
Sender: bugs-owner@freebsd.org
Precedence: bulk

    Ooops, I meant to send this to bugs@freebsd.org so everyone would see it.

    This is another addendum to my ufslk2/newbuf deadlock / crashing bug report.
    To recap: the machine locks up with nearly all the processes stuck in either
    newbuf or ufslk2 sleeps.  It is a heavily loaded pentium, 130+ users
    online, 128MB of ram, NCR scsi, around 12G of disk.  NFS is used heavily as
    well (mainly it is as an NFS client).

    The buffer code (kern/vfs_bio.c) seems to skip the EMPTY queue if it thinks
    too much space is in use... but then it may wind up sleeping on
    needsbuffer.  Could this be why it seems to 'run out of buffers' and
    then deadlock?

    I wrote a little program to count the buf's on each of the queues and came
    up with:

    shell1:/home/dillon# /tmp/bq
    none 0
    lock 0
    lru  229
    vmio 0
    age  0
    empt 825

    By the numbers, it seems a definite possibility.  I also did a gdb -k kernel /dev/mem
    and printed out 'bufspace' and 'maxbufspace':

    (kgdb) print bufspace
    $1 = 8702976
    (kgdb) print maxbufspace
    $2 = 8699904
    (kgdb) print nbuf
    $1 = 1054

    maxbufspace > bufspace seems a perpetual condition (this is on a machine with 128MB
    of ram).  I do not quite understand how 'bufspace' can be so large when only
    229 buffers (800K) is in use unless the clustering is somehow eating up the space
    (128 buffers x maxclustersize ?? it gets a bit confused here).

    It would seem that increasing nbuf will not help though, since there are plenty of
    empty buffers.

    I guess my question is: Am I barking up the wrong tree or can heavy usage cause the
    cascade lockup using the above scenario?

					-Matt