From owner-freebsd-stable Mon Jan 28 0:11:22 2002 Delivered-To: freebsd-stable@freebsd.org Received: from dream.mplik.ru (dream.mplik.ru [195.58.1.132]) by hub.freebsd.org (Postfix) with ESMTP id D06E137B402 for ; Mon, 28 Jan 2002 00:11:15 -0800 (PST) Received: from sight (sight.mplik.ru [195.58.27.104]) by dream.mplik.ru (8.9.3/8.9.1) with ESMTP id NAA09032; Mon, 28 Jan 2002 13:11:01 +0500 (YEKT) Date: Mon, 28 Jan 2002 13:09:47 +0500 From: Sergey Gershtein X-Mailer: The Bat! (v1.53bis) Business Reply-To: Sergey Gershtein Organization: Ural Relcom Ltd X-Priority: 3 (Normal) Message-ID: <1931130530386.20020128130947@ur.ru> To: Doug White Cc: freebsd-stable@FreeBSD.ORG Subject: Re[5]: Strange lock-ups during backup over nfs after adding 1024M RAM In-Reply-To: <20020126204941.H17540-100000@resnet.uoregon.edu> References: <20020126204941.H17540-100000@resnet.uoregon.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sunday, January 27, 2002 you wrote: >> Today the lock-up happened again during backup over nfs with no sign >> in any log files or on console. It was running 4.4-STABLE kernel >> cvsuped yesterday with MAXUSERS 128 and NMBCLUSTERS=8192... Any ideas >> on where to look? DW> Get a serial console. Something may be logged but since it doesn't sync to DW> disk you can't see it afterwards. There is nothing unusual on the console, I checked that. I don't have a serial console, but I guess there's nothing wrong with a regular monitor and keyboard that I use. It was surprising to me that when everything is locked-up you can still type on the keyboard, switch consoles (alt-F1..Alt-Fn), but not log in or even reboot the system with ctrl-alt-del. It just ignores ctrl-alt-del, but allowes to type, scroll the console with arrow keys after pressing scrollLock, etc. There is additional weird thing that I found -- there was at least one process in the system that continued running and writing to log files. It was nmbd that continued to answer its queries and was able to write to log file. All other log files including /var/log/cron and web server logs stopped at the time of lock-up. >> The strange thins is that we have another server with exactly the same >> hardware and amout of RAM which works fine. The only difference is >> that its kernel was compiled with MAXUSERS 1024 for some reason. Do I >> really need to bump MAXUSERS so high to handle more than 1Gb of RAM? DW> So *high*? You need to *reduce* it! 128 should work; if that is blowing DW> up on wierd VM failures, start up a crontask that runs 'sysctl vm.zone' DW> and 'netstat -m' every so often and logs the output. That will say what is DW> gobbling up all the space. DW> NFS is also suspicious .. this is after you updated to today's -STABLE? Currently the server that suffers lock-ups has 1,5Gb RAM, 4.4-RELEASE-p4 kernel csvuped on Jan 24 and compiled with MAXUSERS 128 and NMBCLUSTERS=4096. We disabled backup during the weekend and everything was fine. But right now while I was typing this the lock-up happened again. I just set up a cron job with 'sysctl vm.zone' and 'netstat -m' and the last output was the following: ------------------------------------------------------ vm.zone: ITEM SIZE LIMIT USED FREE REQUESTS PIPE: 160, 0, 28, 74, 517824 SWAPMETA: 160, 385728, 1, 24, 1 unpcb: 64, 0, 29, 227, 15866052 ripcb: 192, 4136, 0, 42, 102 tcpcb: 544, 4136, 2446, 1404, 9889396 udpcb: 192, 4136, 32, 115, 3212780 socket: 192, 4136, 2507, 1462, 28974999 KNOTE: 64, 0, 0, 192, 3139957 NFSNODE: 352, 0, 0, 0, 0 NFSMOUNT: 544, 0, 0, 0, 0 VNODE: 192, 0, 203330, 84, 203330 NAMEI: 1024, 0, 1, 47, 693373551 VMSPACE: 192, 0, 296, 152, 221127 PROC: 416, 0, 300, 141, 221149 DP fakepg: 64, 0, 0, 0, 0 PV ENTRY: 28, 1199982, 192743, 200448, 505651089 MAP ENTRY: 48, 0, 7100, 3398, 40174897 KMAP ENTRY: 48, 96560, 1001, 147, 573777 MAP: 108, 0, 7, 3, 7 VM OBJECT: 96, 0, 203957, 83, 6800770 ------------------------------------------------------ 241/736/16384 mbufs in use (current/peak/max): 197 mbufs allocated to data 44 mbufs allocated to packet headers 175/492/4096 mbuf clusters in use (current/peak/max) 1168 Kbytes allocated to network (9% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines ====================================================== The same output right after reboot: ====================================================== vm.zone: ITEM SIZE LIMIT USED FREE REQUESTS PIPE: 160, 0, 41, 61, 3046 SWAPMETA: 160, 385728, 0, 0, 0 unpcb: 64, 0, 42, 214, 41927 ripcb: 192, 4136, 0, 42, 96 tcpcb: 544, 4136, 2285, 242, 30823 udpcb: 192, 4136, 33, 30, 4686 socket: 192, 4136, 2360, 265, 77534 KNOTE: 64, 0, 1, 127, 4430 NFSNODE: 352, 0, 0, 0, 0 NFSMOUNT: 544, 0, 0, 0, 0 VNODE: 192, 0, 5202, 98, 5202 NAMEI: 1024, 0, 2, 94, 2673554 VMSPACE: 192, 0, 300, 148, 2176 PROC: 416, 0, 304, 137, 2181 DP fakepg: 64, 0, 0, 0, 0 PV ENTRY: 28, 1199982, 170530, 222661, 2468552 MAP ENTRY: 48, 0, 7354, 2464, 125904 KMAP ENTRY: 48, 96560, 1059, 174, 6141 MAP: 108, 0, 7, 3, 7 VM OBJECT: 96, 0, 6682, 168, 37244 ------------------------------------------------------ 317/960/16384 mbufs in use (current/peak/max): 266 mbufs allocated to data 51 mbufs allocated to packet headers 240/686/4096 mbuf clusters in use (current/peak/max) 1612 Kbytes allocated to network (13% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines ====================================================== Is there a chance that cvsuping to 4.5-RC and compiling the kernel with MAXUSERS 0 will make it better? Regards, Sergey Gershtein To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message