From owner-freebsd-current@FreeBSD.ORG Tue Nov 3 13:32:26 2009 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04BB81065679 for ; Tue, 3 Nov 2009 13:32:26 +0000 (UTC) (envelope-from weldon@excelsusphoto.com) Received: from mx0.excelsus.net (emmett.excelsus.com [74.93.113.252]) by mx1.freebsd.org (Postfix) with ESMTP id 87FE18FC29 for ; Tue, 3 Nov 2009 13:32:25 +0000 (UTC) Received: (qmail 47187 invoked by uid 89); 3 Nov 2009 13:32:24 -0000 Received: from unknown (HELO localhost) (127.0.0.1) by localhost.excelsus.com with SMTP; 3 Nov 2009 13:32:24 -0000 Date: Tue, 3 Nov 2009 08:32:24 -0500 (EST) From: Weldon S Godfrey 3 X-X-Sender: weldon@emmett.excelsus.com To: Gavin Atkinson In-Reply-To: <1257185816.44755.29.camel@buffy.york.ac.uk> Message-ID: References: <1257185816.44755.29.camel@buffy.york.ac.uk> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Tue, 03 Nov 2009 14:05:39 +0000 Cc: freebsd-current@FreeBSD.org Subject: Re: FreeBSD 8.0 - network stack crashes? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Nov 2009 13:32:26 -0000 If memory serves me right, sometime around Yesterday, Gavin Atkinson told me: Gavin, thank you A LOT for helping us with this, I have answered as much as I can from the most recent crash below. We did hit max mbufs. It is at 25Kclusters, which is the default. I have upped it to 32K because a rather old article mentioned that as the top end and I need to get into work so I am not trying to do this with a remote console to go higher. I have already set it to reboot next with 64K clusters. I already have kmem maxed to what is bootable (or at least at one time) in 8.0, 4GB, how high can I safely go? This is a NFS server running ZFS with sustained 5 min averages of 120-200Mb/s running as a store for a mail system. > Some things that would be useful: > > - Does "arp -da" fix things? no, it hangs like ssh, route add, etc > - What's the output of "netstat -m" while the networking is broken? Tue Nov 3 07:02:11 CST 2009 36971/2033/39004 mbufs in use (current/cache/total) 24869/731/25600/25600 mbuf clusters in use (current/cache/total/max) 24314/731 mbuf+clusters out of packet secondary zone in use (current/cache) 0/35/35/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 58980K/2110K/61091K bytes allocated to network (current/cache/total) 0/201276/90662 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile 0 calls to protocol drain routines > - What does CTRL-T show for the hung SSH or route processes? of the arp: load: 0.01 cmd: arp 6144 [zonelimit] 0.00u 0.00s 0% 996k > - What does "procstat -kk" on the same processes show? sorry I couldn't get this to run this time, remote console issues > - Does going to single user mode ("init 1" and killing off any leftover > processes) cause the machine to start working again? If so, what's the > output of "netstat -m" afterwards? no, mbuf was still maxed out below is the last vmstat -m Type InUse MemUse HighUse Requests Size(s) ntfs_nthash 1 512K - 1 pfs_nodes 20 5K - 20 256 GEOM 262 52K - 4551 16,32,64,128,256,512,1024,2048 isadev 9 2K - 9 128 cdev 13 4K - 13 256 sigio 1 1K - 1 64 filedesc 127 64K - 6412 512,1024 kenv 75 11K - 80 16,32,64,128 kqueue 0 0K - 188 256,2048 proc-args 41 2K - 5647 16,32,64,128 scsi_cd 0 0K - 333 16 ithread 119 21K - 119 32,128,256 acpica 888 78K - 121045 16,32,64,128,256,512,1024 KTRACE 100 13K - 100 128 acpitask 0 0K - 1 64 linker 139 596K - 181 16,32,64,128,256,512,1024,2048 lockf 11 2K - 399 64,128 CAM dev queue 4 1K - 4 128 ip6ndp 5 1K - 5 64,128 temp 48 562K - 14544952 16,32,64,128,256,512,1024,2048,4096 devbuf 17105 36341K - 24988 16,32,64,128,512,1024,2048,4096 module 420 53K - 420 128 mtx_pool 1 8K - 1 osd 2 1K - 2 16 CAM queue 62 52K - 2211 16,32,64,128,256,512,1024,2048 subproc 562 722K - 6851 512,4096 proc 2 16K - 2 session 33 5K - 127 128 pgrp 37 5K - 190 128 cred 62 16K - 29192756 256 uidinfo 4 3K - 99 64,2048 plimit 17 5K - 910 256 acpisem 15 1K - 15 64 sysctltmp 0 0K - 13867 16,32,64,128,256,512,1024,2048,4096 sysctloid 5400 270K - 5782 16,32,64,128 sysctl 0 0K - 11423 16,32,64 callout 7 3584K - 7 umtx 780 98K - 780 128 p1003.1b 1 1K - 1 16 SWAP 2 3281K - 2 64 kbdmux 8 9K - 8 16,256,512,2048,4096 bus-sc 103 188K - 4558 16,32,64,128,256,512,1024,2048,4096 bus 1174 93K - 57792 16,32,64,128,256,512,1024 clist 54 7K - 54 128 devstat 32 65K - 32 32,4096 eventhandler 64 6K - 64 64,128 kobj 276 1104K - 387 4096 rman 144 18K - 601 16,32,128 mfibuf 3 21K - 12 32,256,512,2048,4096 sbuf 0 0K - 14350 16,32,64,128,256,512,1024,2048,4096 scsi_da 0 0K - 504 16 CAM SIM 4 1K - 4 256 stack 0 0K - 194 256 taskqueue 13 2K - 13 16,32,128 Unitno 11 1K - 4759 32,64 iov 0 0K - 1193 16,64,256,512 select 98 13K - 98 128 ioctlops 0 0K - 14716 16,32,64,128,256,512,1024,4096 msg 4 30K - 4 2048,4096 sem 4 8K - 4 512,1024,2048,4096 shm 1 16K - 1 tty 25 25K - 25 1024 pts 3 1K - 3 256 mbuf_tag 0 0K - 2 32 shmfd 1 8K - 1 CAM periph 54 14K - 371 16,32,64,128,256 pcb 28 157K - 148 16,32,128,1024,2048,4096 soname 5 1K - 18699 16,32,128 biobuf 4 8K - 6 2048 vfscache 1 1024K - 1 cl_savebuf 0 0K - 7 64,128 export_host 5 3K - 5 512 vfs_hash 1 512K - 1 vnodes 2 1K - 2 256 vnodemarker 0 0K - 4832 512 mount 222 15K - 807 16,32,64,128,256,1024 ata_generic 1 1K - 1 1024 BPF 4 1K - 4 128 ether_multi 22 2K - 24 16,32,64 ifaddr 54 14K - 54 32,64,128,256,512,4096 ifnet 5 9K - 5 256,2048 clone 5 20K - 5 4096 arpcom 3 1K - 3 16 routetbl 65 11K - 949 32,64,128,256,512 in_multi 3 1K - 3 64 sctp_iter 0 0K - 3 256 sctp_ifn 3 1K - 3 128 sctp_ifa 4 1K - 4 128 sctp_vrf 1 1K - 1 64 sctp_a_it 0 0K - 3 16 hostcache 1 28K - 1 acd_driver 1 2K - 1 2048 syncache 1 92K - 1 in6_multi 19 2K - 19 32,64,128 ip6_moptions 1 1K - 1 32 NFS FHA 13 3K - 18480347 64,2048 rpc 1381 716K - 82214178 32,64,128,256,512,2048 audit_evclass 168 6K - 205 32 newblk 1 1K - 1 512 inodedep 1 512K - 1 pagedep 1 128K - 1 ufs_dirhash 45 9K - 45 16,32,64,128,512 ufs_mount 3 11K - 3 512,2048 UMAHash 3 130K - 12 512,1024,2048,4096 acpidev 56 4K - 56 64 vm_pgdata 2 129K - 2 128 CAM XPT 589 369K - 2047 32,64,128,256,1024 io_apic 2 4K - 2 2048 pci_link 16 2K - 16 32,128 memdesc 1 4K - 1 4096 msi 3 1K - 3 128 nexusdev 3 1K - 3 16 entropy 1024 64K - 1024 64 twa_commands 2 104K - 101 256 atkbddev 2 1K - 2 64 UART 6 4K - 6 16,512,1024 USBHC 1 1K - 1 128 USBdev 30 11K - 30 16,32,64,128,256,512 USB 157 54K - 190 16,32,64,128,256,1024 DEVFS1 152 76K - 153 512 DEVFS3 165 42K - 167 256 DEVFS 16 1K - 17 16,128 solaris 822038 707024K - 235790398 16,32,64,128,256,512,1024,2048,4096 kstat_data 2 1K - 2 64