Date: Mon, 28 Jan 2002 13:09:47 +0500 From: Sergey Gershtein <sg@ur.ru> To: Doug White <dwhite@resnet.uoregon.edu> Cc: freebsd-stable@FreeBSD.ORG Subject: Re[5]: Strange lock-ups during backup over nfs after adding 1024M RAM Message-ID: <1931130530386.20020128130947@ur.ru> In-Reply-To: <20020126204941.H17540-100000@resnet.uoregon.edu> References: <20020126204941.H17540-100000@resnet.uoregon.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday, January 27, 2002 you wrote:
>> Today the lock-up happened again during backup over nfs with no sign
>> in any log files or on console. It was running 4.4-STABLE kernel
>> cvsuped yesterday with MAXUSERS 128 and NMBCLUSTERS=8192... Any ideas
>> on where to look?
DW> Get a serial console. Something may be logged but since it doesn't sync to
DW> disk you can't see it afterwards.
There is nothing unusual on the console, I checked that. I don't have
a serial console, but I guess there's nothing wrong with a regular
monitor and keyboard that I use. It was surprising to me that when
everything is locked-up you can still type on the keyboard, switch
consoles (alt-F1..Alt-Fn), but not log in or even reboot the system
with ctrl-alt-del. It just ignores ctrl-alt-del, but allowes to type,
scroll the console with arrow keys after pressing scrollLock, etc.
There is additional weird thing that I found -- there was at least one
process in the system that continued running and writing to log files.
It was nmbd that continued to answer its queries and was able to write
to log file. All other log files including /var/log/cron and web
server logs stopped at the time of lock-up.
>> The strange thins is that we have another server with exactly the same
>> hardware and amout of RAM which works fine. The only difference is
>> that its kernel was compiled with MAXUSERS 1024 for some reason. Do I
>> really need to bump MAXUSERS so high to handle more than 1Gb of RAM?
DW> So *high*? You need to *reduce* it! 128 should work; if that is blowing
DW> up on wierd VM failures, start up a crontask that runs 'sysctl vm.zone'
DW> and 'netstat -m' every so often and logs the output. That will say what is
DW> gobbling up all the space.
DW> NFS is also suspicious .. this is after you updated to today's -STABLE?
Currently the server that suffers lock-ups has 1,5Gb RAM,
4.4-RELEASE-p4 kernel csvuped on Jan 24 and compiled with MAXUSERS 128
and NMBCLUSTERS=4096. We disabled backup during the weekend and
everything was fine. But right now while I was typing this the
lock-up happened again. I just set up a cron job with 'sysctl
vm.zone' and 'netstat -m' and the last output was the following:
------------------------------------------------------
vm.zone:
ITEM SIZE LIMIT USED FREE REQUESTS
PIPE: 160, 0, 28, 74, 517824
SWAPMETA: 160, 385728, 1, 24, 1
unpcb: 64, 0, 29, 227, 15866052
ripcb: 192, 4136, 0, 42, 102
tcpcb: 544, 4136, 2446, 1404, 9889396
udpcb: 192, 4136, 32, 115, 3212780
socket: 192, 4136, 2507, 1462, 28974999
KNOTE: 64, 0, 0, 192, 3139957
NFSNODE: 352, 0, 0, 0, 0
NFSMOUNT: 544, 0, 0, 0, 0
VNODE: 192, 0, 203330, 84, 203330
NAMEI: 1024, 0, 1, 47, 693373551
VMSPACE: 192, 0, 296, 152, 221127
PROC: 416, 0, 300, 141, 221149
DP fakepg: 64, 0, 0, 0, 0
PV ENTRY: 28, 1199982, 192743, 200448, 505651089
MAP ENTRY: 48, 0, 7100, 3398, 40174897
KMAP ENTRY: 48, 96560, 1001, 147, 573777
MAP: 108, 0, 7, 3, 7
VM OBJECT: 96, 0, 203957, 83, 6800770
------------------------------------------------------
241/736/16384 mbufs in use (current/peak/max):
197 mbufs allocated to data
44 mbufs allocated to packet headers
175/492/4096 mbuf clusters in use (current/peak/max)
1168 Kbytes allocated to network (9% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
======================================================
The same output right after reboot:
======================================================
vm.zone:
ITEM SIZE LIMIT USED FREE REQUESTS
PIPE: 160, 0, 41, 61, 3046
SWAPMETA: 160, 385728, 0, 0, 0
unpcb: 64, 0, 42, 214, 41927
ripcb: 192, 4136, 0, 42, 96
tcpcb: 544, 4136, 2285, 242, 30823
udpcb: 192, 4136, 33, 30, 4686
socket: 192, 4136, 2360, 265, 77534
KNOTE: 64, 0, 1, 127, 4430
NFSNODE: 352, 0, 0, 0, 0
NFSMOUNT: 544, 0, 0, 0, 0
VNODE: 192, 0, 5202, 98, 5202
NAMEI: 1024, 0, 2, 94, 2673554
VMSPACE: 192, 0, 300, 148, 2176
PROC: 416, 0, 304, 137, 2181
DP fakepg: 64, 0, 0, 0, 0
PV ENTRY: 28, 1199982, 170530, 222661, 2468552
MAP ENTRY: 48, 0, 7354, 2464, 125904
KMAP ENTRY: 48, 96560, 1059, 174, 6141
MAP: 108, 0, 7, 3, 7
VM OBJECT: 96, 0, 6682, 168, 37244
------------------------------------------------------
317/960/16384 mbufs in use (current/peak/max):
266 mbufs allocated to data
51 mbufs allocated to packet headers
240/686/4096 mbuf clusters in use (current/peak/max)
1612 Kbytes allocated to network (13% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
======================================================
Is there a chance that cvsuping to 4.5-RC and compiling the kernel
with MAXUSERS 0 will make it better?
Regards,
Sergey Gershtein
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1931130530386.20020128130947>
