From owner-freebsd-stable  Mon Jan 28  0:11:22 2002
Delivered-To: freebsd-stable@freebsd.org
Received: from dream.mplik.ru (dream.mplik.ru [195.58.1.132])
	by hub.freebsd.org (Postfix) with ESMTP id D06E137B402
	for <freebsd-stable@FreeBSD.ORG>; Mon, 28 Jan 2002 00:11:15 -0800 (PST)
Received: from sight (sight.mplik.ru [195.58.27.104])
	by dream.mplik.ru (8.9.3/8.9.1) with ESMTP id NAA09032;
	Mon, 28 Jan 2002 13:11:01 +0500 (YEKT)
Date: Mon, 28 Jan 2002 13:09:47 +0500
From: Sergey Gershtein <sg@ur.ru>
X-Mailer: The Bat! (v1.53bis) Business
Reply-To: Sergey Gershtein <sg@ur.ru>
Organization: Ural Relcom Ltd
X-Priority: 3 (Normal)
Message-ID: <1931130530386.20020128130947@ur.ru>
To: Doug White <dwhite@resnet.uoregon.edu>
Cc: freebsd-stable@FreeBSD.ORG
Subject: Re[5]: Strange lock-ups during backup over nfs after adding 1024M RAM
In-Reply-To: <20020126204941.H17540-100000@resnet.uoregon.edu>
References: <20020126204941.H17540-100000@resnet.uoregon.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

On Sunday, January 27, 2002 you wrote:
>> Today the lock-up happened again during backup over nfs with no sign
>> in any log files or on console. It was running 4.4-STABLE kernel
>> cvsuped yesterday with MAXUSERS 128 and NMBCLUSTERS=8192... Any ideas
>> on where to look?

DW> Get a serial console. Something may be logged but since it doesn't sync to
DW> disk you can't see it afterwards.

There is nothing unusual on the console, I checked that. I don't have
a serial console, but I guess there's nothing wrong with a regular
monitor and keyboard that I use. It was surprising to me that when
everything is locked-up you can still type on the keyboard, switch
consoles (alt-F1..Alt-Fn), but not log in or even reboot the system
with ctrl-alt-del. It just ignores ctrl-alt-del, but allowes to type,
scroll the console with arrow keys after pressing scrollLock, etc.

There is additional weird thing that I found -- there was at least one
process in the system that continued running and writing to log files.
It was nmbd that continued to answer its queries and was able to write
to log file. All other log files including /var/log/cron and web
server logs stopped at the time of lock-up.

>> The strange thins is that we have another server with exactly the same
>> hardware and amout of RAM which works fine.  The only difference is
>> that its kernel was compiled with MAXUSERS 1024 for some reason.  Do I
>> really need to bump MAXUSERS so high to handle more than 1Gb of RAM?

DW> So *high*? You need to *reduce* it!  128 should work; if that is blowing
DW> up on wierd VM failures, start up a crontask that runs 'sysctl vm.zone'
DW> and 'netstat -m' every so often and logs the output. That will say what is
DW> gobbling up all the space.

DW> NFS is also suspicious .. this is after you updated to today's -STABLE?

Currently the server that suffers lock-ups has 1,5Gb RAM,
4.4-RELEASE-p4 kernel csvuped on Jan 24 and compiled with MAXUSERS 128
and NMBCLUSTERS=4096.  We disabled backup during the weekend and
everything was fine.  But right now while I was typing this the
lock-up happened again.  I just set up a cron job with 'sysctl
vm.zone' and 'netstat -m' and the last output was the following:

------------------------------------------------------
vm.zone:
ITEM            SIZE     LIMIT    USED    FREE  REQUESTS 
                                                         
PIPE:            160,        0,     28,     74,   517824 
SWAPMETA:        160,   385728,      1,     24,        1 
unpcb:            64,        0,     29,    227, 15866052 
ripcb:           192,     4136,      0,     42,      102 
tcpcb:           544,     4136,   2446,   1404,  9889396 
udpcb:           192,     4136,     32,    115,  3212780 
socket:          192,     4136,   2507,   1462, 28974999 
KNOTE:            64,        0,      0,    192,  3139957 
NFSNODE:         352,        0,      0,      0,        0 
NFSMOUNT:        544,        0,      0,      0,        0 
VNODE:           192,        0, 203330,     84,   203330 
NAMEI:          1024,        0,      1,     47, 693373551
VMSPACE:         192,        0,    296,    152,   221127 
PROC:            416,        0,    300,    141,   221149 
DP fakepg:        64,        0,      0,      0,        0 
PV ENTRY:         28,  1199982, 192743, 200448, 505651089
MAP ENTRY:        48,        0,   7100,   3398, 40174897 
KMAP ENTRY:       48,    96560,   1001,    147,   573777 
MAP:             108,        0,      7,      3,        7 
VM OBJECT:        96,        0, 203957,     83,  6800770 
------------------------------------------------------
241/736/16384 mbufs in use (current/peak/max):
        197 mbufs allocated to data                    
        44 mbufs allocated to packet headers           
175/492/4096 mbuf clusters in use (current/peak/max)   
1168 Kbytes allocated to network (9% of mb_map in use) 
0 requests for memory denied                           
0 requests for memory delayed                          
0 calls to protocol drain routines                     
====================================================== 

The same output right after reboot:

======================================================
vm.zone:
ITEM            SIZE     LIMIT    USED    FREE  REQUESTS 
                                                         
PIPE:            160,        0,     41,     61,     3046 
SWAPMETA:        160,   385728,      0,      0,        0 
unpcb:            64,        0,     42,    214,    41927 
ripcb:           192,     4136,      0,     42,       96 
tcpcb:           544,     4136,   2285,    242,    30823 
udpcb:           192,     4136,     33,     30,     4686 
socket:          192,     4136,   2360,    265,    77534 
KNOTE:            64,        0,      1,    127,     4430 
NFSNODE:         352,        0,      0,      0,        0 
NFSMOUNT:        544,        0,      0,      0,        0 
VNODE:           192,        0,   5202,     98,     5202 
NAMEI:          1024,        0,      2,     94,  2673554 
VMSPACE:         192,        0,    300,    148,     2176 
PROC:            416,        0,    304,    137,     2181 
DP fakepg:        64,        0,      0,      0,        0 
PV ENTRY:         28,  1199982, 170530, 222661,  2468552 
MAP ENTRY:        48,        0,   7354,   2464,   125904 
KMAP ENTRY:       48,    96560,   1059,    174,     6141 
MAP:             108,        0,      7,      3,        7
VM OBJECT:        96,        0,   6682,    168,    37244
------------------------------------------------------  
317/960/16384 mbufs in use (current/peak/max):          
        266 mbufs allocated to data                     
        51 mbufs allocated to packet headers            
240/686/4096 mbuf clusters in use (current/peak/max)    
1612 Kbytes allocated to network (13% of mb_map in use) 
0 requests for memory denied                            
0 requests for memory delayed                           
0 calls to protocol drain routines                      
======================================================  

Is there a chance that cvsuping to 4.5-RC and compiling the kernel
with MAXUSERS 0 will make it better?

Regards,
Sergey Gershtein


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message