Date: Thu, 30 Dec 1999 19:37:26 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Tom <tom@uniserve.com> Cc: Peter Wemm <peter@netplex.com.au>, freebsd-stable@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG Subject: Re: softupdates and debug.max_softdeps Message-ID: <199912310337.TAA79239@apollo.backplane.com> References: <Pine.BSF.4.02A.9912301820230.9644-100000@shell.uniserve.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
: : I also don't think "sync" is a fix either. I expect "sync" to reclaim :unused space. For instance, the file system currently shows 9 GB in use :with "df", but there is only about 5 GB actually present on the disk. I :ran "sync", and I expected "df" to report about 5GB used, but it doesn't :seem to change anything. I'm going to try sync again tommorrow once the :unreclaimed space is about 30GB or so, and see if it does anything. Try lots of sync's ... like one a second :-). One sync won't do it. But what we really want to do is make the thing crash and hopefully (with the serial console maybe) get a panic message. Conventionally what should be occuring is that the kernel should be running out of some memory pool. If this is what is occuring it should generate a panic message prior to rebooting. A couple of other things you can do: Compile up the kernel with DDB configured so the system drops into DDB instead of panicing (only do this if you have access to the console). Then you should be able to 'trace' and 'ps' prior to typing 'panic' <return> manually (type as many <return>s as necessary after that but be careful, you don't want to interrupt a kernel dump if the kernel has started one!). Using several local xterms with a large back buffer configured, ssh to the machine under test and setup a couple of csh while(1) loops to look at various kernel resources, e.g. while (1) vmstat -z; vmstat -m end end The reason you use a local xterm in which you ssh to the remote machine is so the xterm doesn't disappear on you when the remote machine crashes :-). A tail -f /var/log/messages will probably *NOT* spit out the panic message quickly enough, but a true serial console (not just a getty running on the port) should spit it out just fine. : One thing that is interesting is that the following sysctl variables are :always zero: : :debug.blk_limit_push: 0 :debug.ino_limit_push: 0 :debug.blk_limit_hit: 0 :debug.ino_limit_hit: 0 :debug.rush_requests: 0 : : So it doesn't look like softupdates is rushing things out. These aren't very useful unless you only have a tiny bit of main memory. for all practical purposes the limit is not usually ever reached (which is probably why its buggy when it *is* reached). : "vmstat -m" is showing that the storage for "inodedep" is steadily :increasing. : : I _think_ I need to increase tick_delay, so when the max_softdeps limit :is finally hit, syncer gets run for a while and clean things up. tick_delay will probably not have much of an effect. look at the vmstat -m output carefully as you run the test (as suggested above). Bad things happen if you run the kernel out of KVM, and that can happen even if you have plenty of normal ram. There are *TWO* limits involved. There is the limit for the memory pool you are observing, and there is a global limit on the grand total which is nominally 2x the per-pool limit. If either limit is reached the machine is hosed. :Tom :Uniserve -Matt Matthew Dillon <dillon@backplane.com> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199912310337.TAA79239>