Date: Thu, 28 Sep 2017 16:32:12 -0400 From: Josh Gitlin <jgitlin@goboomtown.com> To: Alan Somers <asomers@freebsd.org> Cc: FreeBSD Net <freebsd-net@freebsd.org> Subject: Re: Help with mbuf exhaustion Message-ID: <507664F2-8215-4D8B-B474-EA2E8B46D1AD@goboomtown.com> In-Reply-To: <CAOtMX2j7k7GLO2hm-QNJ9yef1V5WMP9SVbQs0p%2Bg7RJOabg-5w@mail.gmail.com> References: <322F6F4B-1153-4ECE-B854-B2981B0CDDF2@goboomtown.com> <CAOtMX2j7k7GLO2hm-QNJ9yef1V5WMP9SVbQs0p%2Bg7RJOabg-5w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
My mistake, the "1" was cut off from my message. We are actually on = FreeBSD 10.3-RELEASE-p21, _not_ p2 -- <http://www.goboomtown.com/>=09 Josh Gitlin Senior Full Stack Developer (415) 690-1610 x155 Stay up to date and join the conversation in Relay = <http://relay.goboomtown.com/>. > On Sep 28, 2017, at 4:30 PM, Alan Somers <asomers@freebsd.org> wrote: >=20 > First of all, 10.3-RELEASE-p2 is very old and has known security > vulnerabilities. Have you tried 10.3-RELEASE-p21 or even 10.4-RELEASE > ? >=20 > On Thu, Sep 28, 2017 at 1:30 PM, Josh Gitlin <jgitlin@goboomtown.com = <mailto:jgitlin@goboomtown.com>> wrote: >> Hi FreeBSD Gurus! >>=20 >> We're having an issue with mbuf exhaustion on a FreeBSD server which = was recently upgraded from 10.3-STABLE to 10.3-RELEASE-p2. Under the = course of normal operation, we see mbuf usage steadily increasing until = we reach kern.ipc.nmbufs limit, at which point the machine becomes = unresponsive over the network (due to lack of mbufs for network access) = and the console displays: >>=20 >> cxl0: Interface stopped DISTRIBUTING, possible flapping >> cxl1: Interface stopped DISTRIBUTING, possible flapping >> [zone: mbuf] kern.ipc.nmbufs limit reached >> [zone: mbuf] kern.ipc.nmbufs limit reached >> The machine runs pf and acts as a packet filter, router, gateway and = DHCP/DNS server. It has two Chelsio NICs in it, and is a CARP master = with a secondary. The secondary has identical configuration of hardware = and software and does not exhibit this issue. >>=20 >> Given the downtime this causes, we set up our Nagios/Check_MK to = graph the output of `netstat -m` and alert when mbufs in use approaches = `kern.ipc.nmbufs` and we see a steady linear increase in mbuf usage = until we reboot: >>=20 >> https://i.stack.imgur.com/8bzAq.png = <https://i.stack.imgur.com/8bzAq.png> = <https://i.stack.imgur.com/8bzAq.png = <https://i.stack.imgur.com/8bzAq.png>> >>=20 >> mbuf *clusters* in use does not change when this happens and = increasing mbuf cluster limits has no effect: >>=20 >> https://i.stack.imgur.com/7OzdN.png = <https://i.stack.imgur.com/7OzdN.png> = <https://i.stack.imgur.com/7OzdN.png = <https://i.stack.imgur.com/7OzdN.png>> >>=20 >> This appears to be a kernel bug of some sort to me, looking for = advice on further troubleshooting or assistance in resolving this! >>=20 >> Helpful (maybe) information: >>=20 >> netstat -m: >>=20 >> 679270/3080/682350 mbufs in use (current/cache/total) >> 10243/1657/11900/985360 mbuf clusters in use = (current/cache/total/max) >> 10243/1648 mbuf+clusters out of packet secondary zone in use = (current/cache) >> 8128/482/8610/124025 4k (page size) jumbo clusters in use = (current/cache/total/max) >> 0/0/0/36748 9k jumbo clusters in use (current/cache/total/max) >> 128/0/128/20670 16k jumbo clusters in use (current/cache/total/max) >> 224863K/6012K/230875K bytes allocated to network = (current/cache/total) >> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) >> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) >> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) >> 0/0/0 requests for jumbo clusters denied (4k/9k/16k) >> 0 requests for sfbufs denied >> 0 requests for sfbufs delayed >> 0 requests for I/O initiated by sendfile >>=20 >> vmstat -z|grep -E '^ITEM|mbuf': >>=20 >> ITEM SIZE LIMIT USED FREE REQ FAIL = SLEEP >> mbuf_packet: 256, 1587540, 10239, 1652,84058893, 0, = 0 >> mbuf: 256, 1587540, 671533, 1206,914478880, = 0, 0 >> mbuf_cluster: 2048, 985360, 11891, 9, 11891, 0, = 0 >> mbuf_jumbo_page: 4096, 124025, 8128, 512,15011847, 0, = 0 >> mbuf_jumbo_9k: 9216, 36748, 0, 0, 0, 0, = 0 >> mbuf_jumbo_16k: 16384, 20670, 128, 0, 128, 0, = 0 >> mbuf_ext_refcnt: 4, 0, 0, 0, 0, 0, = 0 >>=20 >> vmstat -m: >>=20 >> Type InUse MemUse HighUse Requests Size(s) >> NFSD lckfile 1 1K - 1 256 >> filedesc 103 383K - 1134731 = 16,32,128,2048,4096,8192,16384,65536 >> sigio 1 1K - 1 64 >> filecaps 0 0K - 973 64 >> kdtrace 292 59K - 1099386 64,256 >> kenv 121 13K - 125 16,32,64,128,8192 >> kqueue 14 22K - 5374 256,2048,8192 >> proc-args 54 5K - 578448 16,32,64,128,256 >> hhook 2 1K - 2 256 >> ithread 146 24K - 146 32,128,256 >> KTRACE 100 13K - 100 128 >> NFS fh 1 1K - 584 32 >> linker 207 1052K - 234 = 16,32,64,128,256,512,1024,2048,4096,8192,16384,65536 >> lockf 29 3K - 20042 64,128 >> loginclass 2 1K - 1192 64 >> devbuf 17205 36362K - 17523 = 16,32,64,128,256,512,1024,2048,4096,8192,65536 >> temp 149 51K - 1280113 = 16,32,64,128,256,512,1024,2048,4096,8192,16384,65536 >> ip6opt 5 2K - 6 256 >> ip6ndp 27 2K - 27 64,128 >> module 230 29K - 230 128 >> mtx_pool 2 16K - 2 8192 >> osd 3 1K - 5 16,32,64 >> pmchooks 1 1K - 1 128 >> pgrp 30 4K - 2222 128 >> session 29 4K - 2187 128 >> proc 2 32K - 2 16384 >> subproc 211 368K - 1099014 512,4096 >> cred 204 32K - 6025704 64,256 >> plimit 19 5K - 3985 256 >> uidinfo 9 5K - 11892 128,4096 >> NFSD session 1 1K - 1 1024 >> sysctl 0 0K - 63851 16,32,64 >> sysctloid 7196 365K - 7369 16,32,64,128 >> sysctltmp 0 0K - 17834 16,32,64,128 >> tidhash 1 32K - 1 32768 >> callout 5 2184K - 5 >> umtx 522 66K - 522 128 >> p1003.1b 1 1K - 1 16 >> SWAP 2 549K - 2 64 >> bus 802 86K - 6536 16,32,64,128,256,1024 >> bus-sc 57 1671K - 2431 = 16,32,64,128,256,512,1024,2048,4096,8192,16384,65536 >> newnfsmnt 1 1K - 1 1024 >> devstat 8 17K - 8 32,4096 >> eventhandler 116 10K - 116 64,128 >> kobj 124 496K - 296 4096 >> acpiintr 1 1K - 1 64 >> Per-cpu 1 1K - 1 32 >> acpica 14355 1420K - 216546 = 16,32,64,128,256,512,1024,2048,4096 >> pci_link 16 2K - 16 64,128 >> pfs_nodes 21 6K - 21 256 >> rman 316 37K - 716 16,32,128 >> sbuf 1 1K - 41375 = 16,32,64,128,256,512,1024,2048,4096,8192,16384 >> sglist 8 8K - 8 1024 >> GEOM 88 15K - 1871 = 16,32,64,128,256,512,1024,2048,8192,16384 >> acpipwr 5 1K - 5 64 >> taskqueue 43 7K - 43 16,32,256 >> Unitno 22 2K - 1208250 32,64 >> vmem 3 144K - 6 1024,4096,8192 >> ioctlops 0 0K - 185700 256,512,1024,2048,4096 >> select 89 12K - 89 128 >> iov 0 0K - 19808992 16,64,128,256,512,1024 >> msg 4 30K - 4 2048,4096,8192,16384 >> sem 4 106K - 4 2048,4096 >> shm 1 32K - 1 32768 >> tty 20 20K - 499 1024 >> pts 1 1K - 480 256 >> accf 2 1K - 2 64 >> mbuf_tag 0 0K - 291472282 32,64,128 >> shmfd 1 8K - 1 8192 >> soname 32 4K - 1210442 16,32,128 >> pcb 36 663K - 76872 = 16,32,64,128,1024,2048,8192 >> CAM CCB 0 0K - 182128 2048 >> acl 0 0K - 2 4096 >> vfscache 1 2048K - 1 >> cl_savebuf 0 0K - 480 64 >> vfs_hash 1 1024K - 1 >> vnodes 1 1K - 1 256 >> entropy 1026 65K - 49107 32,64,4096 >> mount 64 3K - 140 16,32,64,128,256 >> vnodemarker 0 0K - 4212 512 >> BPF 112 20504K - 131 16,64,128,512,4096 >> CAM path 11 1K - 63 32 >> ifnet 29 57K - 30 128,256,2048 >> ifaddr 315 105K - 315 = 32,64,128,256,512,2048,4096 >> ether_multi 232 13K - 282 16,32,64 >> clone 10 2K - 10 128 >> arpcom 23 1K - 23 16 >> gif 4 1K - 4 32,256 >> lltable 155 53K - 551 256,512 >> UART 6 5K - 6 16,1024 >> vlan 56 5K - 74 64,128 >> acpitask 1 16K - 1 16384 >> acpisem 110 14K - 110 128 >> raid_data 0 0K - 108 32,128,256 >> routetbl 516 136K - 101735 32,64,128,256,512 >> igmp 28 7K - 28 256 >> CARP 76 30K - 83 16,32,64,128,256,512,1024 >> ipid 2 24K - 2 8192,16384 >> in_mfilter 112 112K - 112 1024 >> in_multi 43 11K - 43 256 >> ip_moptions 224 35K - 224 64,256 >> CAM periph 7 2K - 19 16,32,64,128,256 >> acpidev 128 8K - 128 64 >> CAM queue 15 5K - 39 16,32,512 >> encap_export_host 4 4K - 4 1024 >> sctp_a_it 0 0K - 36 16 >> sctp_vrf 1 1K - 1 64 >> sctp_ifa 115 15K - 204 128 >> sctp_ifn 21 3K - 23 128 >> sctp_iter 0 0K - 36 256 >> hostcache 1 32K - 1 32768 >> syncache 1 64K - 1 65536 >> in6_mfilter 1 1K - 1 1024 >> in6_multi 15 2K - 15 32,256 >> ip6_moptions 2 1K - 2 32,256 >> CAM dev queue 6 1K - 6 64 >> kbdmux 6 22K - 6 16,512,1024,2048,16384 >> mld 26 4K - 26 128 >> LED 20 2K - 20 16,128 >> inpcbpolicy 365 12K - 119277 32 >> secasvar 7 2K - 214 256 >> sahead 10 3K - 10 256 >> ipsecpolicy 748 187K - 241562 256 >> ipsecrequest 18 3K - 72 128 >> ipsec-misc 56 2K - 1712 16,32,64 >> ipsec-saq 0 0K - 24 128 >> ipsec-reg 3 1K - 3 32 >> pfsync 2 2K - 893 32,256,1024 >> pf_temp 0 0K - 78 128 >> pf_hash 3 2880K - 3 >> pf_ifnet 36 11K - 9510 256,2048 >> pf_tag 7 1K - 7 128 >> pf_altq 5 2K - 125 256 >> pf_rule 964 904K - 17500 128,1024 >> pf_osfp 1130 115K - 28250 64,128 >> pf_table 49 98K - 948 2048 >> crypto 37 11K - 1072 64,128,256,512,1024 >> xform 7 1K - 1530156 16,32,64,128,256 >> rpc 12 20K - 304 64,128,512,1024,8192 >> audit_evclass 187 6K - 231 32 >> ufs_dirhash 93 18K - 93 16,32,64,128,256,512 >> ufs_quota 1 1024K - 1 >> ufs_mount 3 13K - 3 512,4096,8192 >> vm_pgdata 2 513K - 2 128 >> UMAHash 5 6K - 10 512,1024,2048 >> CAM SIM 6 2K - 6 256 >> CAM XPT 30 3K - 1850 = 16,32,64,128,256,512,1024,2048,65536 >> CAM DEV 9 18K - 16 2048 >> fpukern_ctx 3 6K - 3 2048 >> memdesc 1 4K - 1 4096 >> USB 23 33K - 24 = 16,128,256,512,1024,2048,4096 >> DEVFS3 136 34K - 2027 256 >> DEVFS1 108 54K - 594 512 >> apmdev 1 1K - 1 128 >> madt_table 0 0K - 1 4096 >> DEVFS_RULE 55 26K - 55 64,512 >> DEVFS 12 1K - 13 16,128 >> DEVFSP 22 2K - 167 64 >> io_apic 1 2K - 1 2048 >> isadev 8 1K - 8 128 >> MCA 15 2K - 15 32,128 >> msi 30 4K - 30 128 >> nexusdev 5 1K - 5 16 >> USBdev 21 8K - 21 = 32,64,128,256,512,1024,4096 >> NFSD V4client 1 1K - 1 256 >> cdev 5 2K - 5 256 >> cxgbe 41 956K - 44 = 128,256,512,1024,2048,4096,8192,16384 >> ipmi 0 0K - 20155 128,2048 >> htcp data 127 4K - 13675 32 >> aesni_data 3 3K - 3 1024 >> solaris 142 12302K - 3189 = 16,32,64,128,512,1024,8192 >> kstat_data 6 1K - 6 64 >>=20 >> TCP States: >>=20 >> https://i.stack.imgur.com/G7850.png = <https://i.stack.imgur.com/G7850.png> >>=20 >>=20 >> -- >> <http://www.goboomtown.com/ <http://www.goboomtown.com/>> >> Josh Gitlin >> Senior Full Stack Developer >> (415) 690-1610 x155 >>=20 >> Stay up to date and join the conversation in Relay = <http://relay.goboomtown.com/ <http://relay.goboomtown.com/>>. >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org> mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net = <https://lists.freebsd.org/mailman/listinfo/freebsd-net> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org = <mailto:freebsd-net-unsubscribe@freebsd.org>"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?507664F2-8215-4D8B-B474-EA2E8B46D1AD>