From owner-freebsd-bugs Thu Jul 13 23:21:24 1995 Return-Path: bugs-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id XAA18558 for bugs-outgoing; Thu, 13 Jul 1995 23:21:24 -0700 Received: from blob.best.net (blob.best.net [204.156.128.88]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id XAA18552 for ; Thu, 13 Jul 1995 23:21:23 -0700 Received: (dillon@localhost) by blob.best.net (8.6.12/8.6.5) id XAA27589; Thu, 13 Jul 1995 23:21:21 -0700 Date: Thu, 13 Jul 1995 23:21:21 -0700 From: Matt Dillon Message-Id: <199507140621.XAA27589@blob.best.net> To: bugs@freebsd.org Subject: Another clue. (ufslk2/newbuf deadlock crash) Sender: bugs-owner@freebsd.org Precedence: bulk Ooops, I meant to send this to bugs@freebsd.org so everyone would see it. This is another addendum to my ufslk2/newbuf deadlock / crashing bug report. To recap: the machine locks up with nearly all the processes stuck in either newbuf or ufslk2 sleeps. It is a heavily loaded pentium, 130+ users online, 128MB of ram, NCR scsi, around 12G of disk. NFS is used heavily as well (mainly it is as an NFS client). The buffer code (kern/vfs_bio.c) seems to skip the EMPTY queue if it thinks too much space is in use... but then it may wind up sleeping on needsbuffer. Could this be why it seems to 'run out of buffers' and then deadlock? I wrote a little program to count the buf's on each of the queues and came up with: shell1:/home/dillon# /tmp/bq none 0 lock 0 lru 229 vmio 0 age 0 empt 825 By the numbers, it seems a definite possibility. I also did a gdb -k kernel /dev/mem and printed out 'bufspace' and 'maxbufspace': (kgdb) print bufspace $1 = 8702976 (kgdb) print maxbufspace $2 = 8699904 (kgdb) print nbuf $1 = 1054 maxbufspace > bufspace seems a perpetual condition (this is on a machine with 128MB of ram). I do not quite understand how 'bufspace' can be so large when only 229 buffers (800K) is in use unless the clustering is somehow eating up the space (128 buffers x maxclustersize ?? it gets a bit confused here). It would seem that increasing nbuf will not help though, since there are plenty of empty buffers. I guess my question is: Am I barking up the wrong tree or can heavy usage cause the cascade lockup using the above scenario? -Matt