From owner-freebsd-current@FreeBSD.ORG Tue Nov 3 14:18:53 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E30E310656E8 for ; Tue, 3 Nov 2009 14:18:52 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-pz0-f202.google.com (mail-pz0-f202.google.com [209.85.222.202]) by mx1.freebsd.org (Postfix) with ESMTP id AD8678FC15 for ; Tue, 3 Nov 2009 14:18:52 +0000 (UTC) Received: by pzk40 with SMTP id 40so3911663pzk.7 for ; Tue, 03 Nov 2009 06:18:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=+ZoD3Y7x4x9lh8D3c1qHOGr5lVZ8+z7AWOQp/Z9hu+U=; b=iyonov68VhdUoNKkXgrZ2ge9a90zqLMKet2a86RosYrLCLC06TVu4s9kPJV99hWpRN RukVwxyDKl7XqllIrWm1uApZuIB5vGRUsDtpynC55N8T40R3SmWhDFwiZbccE7L9j9Gg 4VM3HOo37kH9GF88JVWQN1h8BOjeDsZmVF0mc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=bVt8ihPErvetItLkWKxASgoiafTTQzv0kzhvEpaDgrudlidyrzvG3ypzsQ/kBtbh/9 ec9rjl47gnRvPgPC/YnO/dngEmjiUYuik4sIwRDpI5T0PO1TnPXxfm3+KDYSOTRxsWso jjoqDsE4wBrdFMkjKMgZ/+nZHwFxSrum2ONq0= MIME-Version: 1.0 Received: by 10.142.60.8 with SMTP id i8mr477wfa.310.1257257932297; Tue, 03 Nov 2009 06:18:52 -0800 (PST) In-Reply-To: References: <1257185816.44755.29.camel@buffy.york.ac.uk> Date: Tue, 3 Nov 2009 08:18:52 -0600 Message-ID: <6201873e0911030618u3984a7c1hc05331b0021f796f@mail.gmail.com> From: Adam Vande More To: Weldon S Godfrey 3 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Gavin Atkinson , freebsd-current@freebsd.org Subject: Re: FreeBSD 8.0 - network stack crashes? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Nov 2009 14:18:53 -0000 On Tue, Nov 3, 2009 at 7:32 AM, Weldon S Godfrey 3 wrote: > > > If memory serves me right, sometime around Yesterday, Gavin Atkinson told > me: > > Gavin, thank you A LOT for helping us with this, I have answered as much as > I can from the most recent crash below. We did hit max mbufs. It is at > 25Kclusters, which is the default. I have upped it to 32K because a rather > old article mentioned that as the top end and I need to get into work so I > am not trying to do this with a remote console to go higher. I have already > set it to reboot next with 64K clusters. I already have kmem maxed to what > is bootable (or at least at one time) in 8.0, 4GB, how high can I safely go? > This is a NFS server running ZFS with sustained 5 min averages of > 120-200Mb/s running as a store for a mail system. > > > Some things that would be useful: >> >> - Does "arp -da" fix things? >> > > no, it hangs like ssh, route add, etc > > > - What's the output of "netstat -m" while the networking is broken? >> > Tue Nov 3 07:02:11 CST 2009 > 36971/2033/39004 mbufs in use (current/cache/total) > 24869/731/25600/25600 mbuf clusters in use (current/cache/total/max) > 24314/731 mbuf+clusters out of packet secondary zone in use (current/cache) > 0/35/35/12800 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 58980K/2110K/61091K bytes allocated to network (current/cache/total) > 0/201276/90662 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > > > - What does CTRL-T show for the hung SSH or route processes? >> > > of the arp: > load: 0.01 cmd: arp 6144 [zonelimit] 0.00u 0.00s 0% 996k > > > - What does "procstat -kk" on the same processes show? >> > sorry I couldn't get this to run this time, remote console issues > > > - Does going to single user mode ("init 1" and killing off any leftover >> processes) cause the machine to start working again? If so, what's the >> output of "netstat -m" afterwards? >> > > no, mbuf was still maxed out > > > below is the last vmstat -m Type InUse MemUse HighUse Requests > Size(s) > ntfs_nthash 1 512K - 1 > pfs_nodes 20 5K - 20 256 > GEOM 262 52K - 4551 16,32,64,128,256,512,1024,2048 > isadev 9 2K - 9 128 > cdev 13 4K - 13 256 > sigio 1 1K - 1 64 > filedesc 127 64K - 6412 512,1024 > kenv 75 11K - 80 16,32,64,128 > kqueue 0 0K - 188 256,2048 > proc-args 41 2K - 5647 16,32,64,128 > scsi_cd 0 0K - 333 16 > ithread 119 21K - 119 32,128,256 > acpica 888 78K - 121045 16,32,64,128,256,512,1024 > KTRACE 100 13K - 100 128 > acpitask 0 0K - 1 64 > linker 139 596K - 181 16,32,64,128,256,512,1024,2048 > lockf 11 2K - 399 64,128 > CAM dev queue 4 1K - 4 128 > ip6ndp 5 1K - 5 64,128 > temp 48 562K - 14544952 > 16,32,64,128,256,512,1024,2048,4096 > devbuf 17105 36341K - 24988 16,32,64,128,512,1024,2048,4096 > module 420 53K - 420 128 > mtx_pool 1 8K - 1 > osd 2 1K - 2 16 > CAM queue 62 52K - 2211 16,32,64,128,256,512,1024,2048 > subproc 562 722K - 6851 512,4096 > proc 2 16K - 2 > session 33 5K - 127 128 > pgrp 37 5K - 190 128 > cred 62 16K - 29192756 256 > uidinfo 4 3K - 99 64,2048 > plimit 17 5K - 910 256 > acpisem 15 1K - 15 64 > sysctltmp 0 0K - 13867 > 16,32,64,128,256,512,1024,2048,4096 > sysctloid 5400 270K - 5782 16,32,64,128 > sysctl 0 0K - 11423 16,32,64 > callout 7 3584K - 7 > umtx 780 98K - 780 128 > p1003.1b 1 1K - 1 16 > SWAP 2 3281K - 2 64 > kbdmux 8 9K - 8 16,256,512,2048,4096 > bus-sc 103 188K - 4558 > 16,32,64,128,256,512,1024,2048,4096 > bus 1174 93K - 57792 16,32,64,128,256,512,1024 > clist 54 7K - 54 128 > devstat 32 65K - 32 32,4096 > eventhandler 64 6K - 64 64,128 > kobj 276 1104K - 387 4096 > rman 144 18K - 601 16,32,128 > mfibuf 3 21K - 12 32,256,512,2048,4096 > sbuf 0 0K - 14350 > 16,32,64,128,256,512,1024,2048,4096 > scsi_da 0 0K - 504 16 > CAM SIM 4 1K - 4 256 > stack 0 0K - 194 256 > taskqueue 13 2K - 13 16,32,128 > Unitno 11 1K - 4759 32,64 > iov 0 0K - 1193 16,64,256,512 > select 98 13K - 98 128 > ioctlops 0 0K - 14716 16,32,64,128,256,512,1024,4096 > msg 4 30K - 4 2048,4096 > sem 4 8K - 4 512,1024,2048,4096 > shm 1 16K - 1 > tty 25 25K - 25 1024 > pts 3 1K - 3 256 > mbuf_tag 0 0K - 2 32 > shmfd 1 8K - 1 > CAM periph 54 14K - 371 16,32,64,128,256 > pcb 28 157K - 148 16,32,128,1024,2048,4096 > soname 5 1K - 18699 16,32,128 > biobuf 4 8K - 6 2048 > vfscache 1 1024K - 1 > cl_savebuf 0 0K - 7 64,128 > export_host 5 3K - 5 512 > vfs_hash 1 512K - 1 > vnodes 2 1K - 2 256 > vnodemarker 0 0K - 4832 512 > mount 222 15K - 807 16,32,64,128,256,1024 > ata_generic 1 1K - 1 1024 > BPF 4 1K - 4 128 > ether_multi 22 2K - 24 16,32,64 > ifaddr 54 14K - 54 32,64,128,256,512,4096 > ifnet 5 9K - 5 256,2048 > clone 5 20K - 5 4096 > arpcom 3 1K - 3 16 > routetbl 65 11K - 949 32,64,128,256,512 > in_multi 3 1K - 3 64 > sctp_iter 0 0K - 3 256 > sctp_ifn 3 1K - 3 128 > sctp_ifa 4 1K - 4 128 > sctp_vrf 1 1K - 1 64 > sctp_a_it 0 0K - 3 16 > hostcache 1 28K - 1 > acd_driver 1 2K - 1 2048 > syncache 1 92K - 1 > in6_multi 19 2K - 19 32,64,128 > ip6_moptions 1 1K - 1 32 > NFS FHA 13 3K - 18480347 64,2048 > rpc 1381 716K - 82214178 32,64,128,256,512,2048 > audit_evclass 168 6K - 205 32 > newblk 1 1K - 1 512 > inodedep 1 512K - 1 > pagedep 1 128K - 1 > ufs_dirhash 45 9K - 45 16,32,64,128,512 > ufs_mount 3 11K - 3 512,2048 > UMAHash 3 130K - 12 512,1024,2048,4096 > acpidev 56 4K - 56 64 > vm_pgdata 2 129K - 2 128 > CAM XPT 589 369K - 2047 32,64,128,256,1024 > io_apic 2 4K - 2 2048 > pci_link 16 2K - 16 32,128 > memdesc 1 4K - 1 4096 > msi 3 1K - 3 128 > nexusdev 3 1K - 3 16 > entropy 1024 64K - 1024 64 > twa_commands 2 104K - 101 256 > atkbddev 2 1K - 2 64 > UART 6 4K - 6 16,512,1024 > USBHC 1 1K - 1 128 > USBdev 30 11K - 30 16,32,64,128,256,512 > USB 157 54K - 190 16,32,64,128,256,1024 > DEVFS1 152 76K - 153 512 > DEVFS3 165 42K - 167 256 > DEVFS 16 1K - 17 16,128 > solaris 822038 707024K - 235790398 > 16,32,64,128,256,512,1024,2048,4096 > kstat_data 2 1K - 2 64 > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > from man tuning: kern.ipc.nmbclusters may be adjusted to increase the number of network mbufs the system is willing to allocate. Each cluster represents approx- imately 2K of memory, so a value of 1024 represents 2M of kernel memory reserved for network buffers. You can do a simple calculation to figure out how many you need. If you have a web server which maxes out at 1000 simultaneous connections, and each connection eats a 16K receive and 16K send buffer, you need approximately 32MB worth of network buffers to deal with it. A good rule of thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. So for this case you would want to set kern.ipc.nmbclusters to 32768. We recommend values between 1024 and 4096 for machines with mod- erates amount of memory, and between 4096 and 32768 for machines with greater amounts of memory. Under no circumstances should you specify an arbitrarily high value for this parameter, it could lead to a boot-time crash. The -m option to netstat(1) may be used to observe network clus- ter use. Older versions of FreeBSD do not have this tunable and require that the kernel config(8) option NMBCLUSTERS be set instead. More and more programs are using the sendfile(2) system call to transmit files over the network. The kern.ipc.nsfbufs sysctl controls the number of file system buffers sendfile(2) is allowed to use to perform its work. This parameter nominally scales with kern.maxusers so you should not need to modify this parameter except under extreme circumstances. See the TUNING section in the sendfile(2) manual page for details. -- Adam Vande More