From owner-freebsd-net@freebsd.org Thu Sep 28 19:30:13 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2A9DEE07006 for ; Thu, 28 Sep 2017 19:30:13 +0000 (UTC) (envelope-from jgitlin@goboomtown.com) Received: from mail-yw0-x230.google.com (mail-yw0-x230.google.com [IPv6:2607:f8b0:4002:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D1E876AC55 for ; Thu, 28 Sep 2017 19:30:12 +0000 (UTC) (envelope-from jgitlin@goboomtown.com) Received: by mail-yw0-x230.google.com with SMTP id i6so1754626ywc.9 for ; Thu, 28 Sep 2017 12:30:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=goboomtown-com.20150623.gappssmtp.com; s=20150623; h=from:mime-version:subject:message-id:date:to; bh=sLDXeehe0+qAK47yLsWm6lb7JqtPSMyrKTwxMyRcsLc=; b=0eXxXAR1C15lI3pimyfcaHh9y8YvJwF+OqPj6+HZnx5ng4N6DGk7JZLb7dfKCEEwuk FLJ7fYZL8RiRMM+OJp3GJM9V9RpcNVlYYfq/PVQ1TkUd3syq4uJofHtSx1DzhhNRrbw9 mu0BCC5QeLgzHK05+winnLWMTzEWt752AFqV+617W3Dl2j57iFuP2M1bVigVrL6ntt7c 4DEhphLX9KnY2wozZu9WdFsp0ZjODLi+QNjDI6cHis9/VAkj20SsEgLeFQ4pDxZN9yGr 4e8I0lKusdJ+yWPmz+gFWaw0PgA1Vxwa7U+6S30Q8NFwEvCqcMbHt1txrODJYVzn8njZ CoyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:mime-version:subject:message-id:date:to; bh=sLDXeehe0+qAK47yLsWm6lb7JqtPSMyrKTwxMyRcsLc=; b=ORe+iHnG0+Fhqg5xvMVEtnA0EsRp201xxAVgY8X6E1PjsYqnxF75CFPWbNmTc8xGaX +730JeLA4OaDgUs4UE5ipsGA+1QN6w5iQut9crmmY1LsEskHGFCCfkWvEj+nPNJjwtqW A78G1isnvyRIHAHJhUrhuqOiqEXhTqRg1lHUZElSGyepfsS/A7k3N9XaUX+wGSz718/M SvjK3lbpjg+4kmVWpigpzQTpARE0tjnzpO5ayvc0sQKJCDqXoi12XwhQcLrl2dtHCFl0 IeIAPch8oI36BFAS17n8XwyKna7mQf7Pfn3gs0w98l8SDPy3iQKnVv8qK1qPjKZM67Yw iNHQ== X-Gm-Message-State: AHPjjUhiPYO2ST/mVWs5v4LB7QqdDjsezawMnS2T5gN6muqOoMIIP/7m xPLWK6JZ2HF8FTzp29HiGPDu5NtaHxQ= X-Google-Smtp-Source: AOwi7QBXEUIZ9Godm71TAhr8Gw8AMryo+W0JMa/V9Tn2t6p2e8DS56cOksgV2Hti/xvL7PP7PuhUnA== X-Received: by 10.13.230.84 with SMTP id p81mr3272859ywe.331.1506627009419; Thu, 28 Sep 2017 12:30:09 -0700 (PDT) Received: from yyz.farcry.sitepalette.com (24-181-214-175.dhcp.hckr.nc.charter.com. [24.181.214.175]) by smtp.gmail.com with ESMTPSA id u5sm909051ywi.0.2017.09.28.12.30.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Sep 2017 12:30:09 -0700 (PDT) From: Josh Gitlin Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Help with mbuf exhaustion Message-Id: <322F6F4B-1153-4ECE-B854-B2981B0CDDF2@goboomtown.com> Date: Thu, 28 Sep 2017 15:30:08 -0400 To: freebsd-net@freebsd.org X-Mailer: Apple Mail (2.3273) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Sep 2017 19:30:13 -0000 Hi FreeBSD Gurus! We're having an issue with mbuf exhaustion on a FreeBSD server which was = recently upgraded from 10.3-STABLE to 10.3-RELEASE-p2. Under the course = of normal operation, we see mbuf usage steadily increasing until we = reach kern.ipc.nmbufs limit, at which point the machine becomes = unresponsive over the network (due to lack of mbufs for network access) = and the console displays: cxl0: Interface stopped DISTRIBUTING, possible flapping cxl1: Interface stopped DISTRIBUTING, possible flapping [zone: mbuf] kern.ipc.nmbufs limit reached [zone: mbuf] kern.ipc.nmbufs limit reached The machine runs pf and acts as a packet filter, router, gateway and = DHCP/DNS server. It has two Chelsio NICs in it, and is a CARP master = with a secondary. The secondary has identical configuration of hardware = and software and does not exhibit this issue. Given the downtime this causes, we set up our Nagios/Check_MK to graph = the output of `netstat -m` and alert when mbufs in use approaches = `kern.ipc.nmbufs` and we see a steady linear increase in mbuf usage = until we reboot: https://i.stack.imgur.com/8bzAq.png = mbuf *clusters* in use does not change when this happens and increasing = mbuf cluster limits has no effect: https://i.stack.imgur.com/7OzdN.png = This appears to be a kernel bug of some sort to me, looking for advice = on further troubleshooting or assistance in resolving this! Helpful (maybe) information: netstat -m: 679270/3080/682350 mbufs in use (current/cache/total) 10243/1657/11900/985360 mbuf clusters in use (current/cache/total/max) 10243/1648 mbuf+clusters out of packet secondary zone in use = (current/cache) 8128/482/8610/124025 4k (page size) jumbo clusters in use = (current/cache/total/max) 0/0/0/36748 9k jumbo clusters in use (current/cache/total/max) 128/0/128/20670 16k jumbo clusters in use (current/cache/total/max) 224863K/6012K/230875K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile vmstat -z|grep -E '^ITEM|mbuf': ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP mbuf_packet: 256, 1587540, 10239, 1652,84058893, 0, = 0 mbuf: 256, 1587540, 671533, 1206,914478880, 0, = 0 mbuf_cluster: 2048, 985360, 11891, 9, 11891, 0, 0 mbuf_jumbo_page: 4096, 124025, 8128, 512,15011847, 0, 0 mbuf_jumbo_9k: 9216, 36748, 0, 0, 0, 0, 0 mbuf_jumbo_16k: 16384, 20670, 128, 0, 128, 0, 0 mbuf_ext_refcnt: 4, 0, 0, 0, 0, 0, 0 vmstat -m: Type InUse MemUse HighUse Requests Size(s) NFSD lckfile 1 1K - 1 256 filedesc 103 383K - 1134731 = 16,32,128,2048,4096,8192,16384,65536 sigio 1 1K - 1 64 filecaps 0 0K - 973 64 kdtrace 292 59K - 1099386 64,256 kenv 121 13K - 125 16,32,64,128,8192 kqueue 14 22K - 5374 256,2048,8192 proc-args 54 5K - 578448 16,32,64,128,256 hhook 2 1K - 2 256 ithread 146 24K - 146 32,128,256 KTRACE 100 13K - 100 128 NFS fh 1 1K - 584 32 linker 207 1052K - 234 = 16,32,64,128,256,512,1024,2048,4096,8192,16384,65536 lockf 29 3K - 20042 64,128 loginclass 2 1K - 1192 64 devbuf 17205 36362K - 17523 = 16,32,64,128,256,512,1024,2048,4096,8192,65536 temp 149 51K - 1280113 = 16,32,64,128,256,512,1024,2048,4096,8192,16384,65536 ip6opt 5 2K - 6 256 ip6ndp 27 2K - 27 64,128 module 230 29K - 230 128 mtx_pool 2 16K - 2 8192 osd 3 1K - 5 16,32,64 pmchooks 1 1K - 1 128 pgrp 30 4K - 2222 128 session 29 4K - 2187 128 proc 2 32K - 2 16384 subproc 211 368K - 1099014 512,4096 cred 204 32K - 6025704 64,256 plimit 19 5K - 3985 256 uidinfo 9 5K - 11892 128,4096 NFSD session 1 1K - 1 1024 sysctl 0 0K - 63851 16,32,64 sysctloid 7196 365K - 7369 16,32,64,128 sysctltmp 0 0K - 17834 16,32,64,128 tidhash 1 32K - 1 32768 callout 5 2184K - 5 =20 umtx 522 66K - 522 128 p1003.1b 1 1K - 1 16 SWAP 2 549K - 2 64 bus 802 86K - 6536 16,32,64,128,256,1024 bus-sc 57 1671K - 2431 = 16,32,64,128,256,512,1024,2048,4096,8192,16384,65536 newnfsmnt 1 1K - 1 1024 devstat 8 17K - 8 32,4096 eventhandler 116 10K - 116 64,128 kobj 124 496K - 296 4096 acpiintr 1 1K - 1 64 Per-cpu 1 1K - 1 32 acpica 14355 1420K - 216546 = 16,32,64,128,256,512,1024,2048,4096 pci_link 16 2K - 16 64,128 pfs_nodes 21 6K - 21 256 rman 316 37K - 716 16,32,128 sbuf 1 1K - 41375 = 16,32,64,128,256,512,1024,2048,4096,8192,16384 sglist 8 8K - 8 1024 GEOM 88 15K - 1871 = 16,32,64,128,256,512,1024,2048,8192,16384 acpipwr 5 1K - 5 64 taskqueue 43 7K - 43 16,32,256 Unitno 22 2K - 1208250 32,64 vmem 3 144K - 6 1024,4096,8192 ioctlops 0 0K - 185700 256,512,1024,2048,4096 select 89 12K - 89 128 iov 0 0K - 19808992 16,64,128,256,512,1024 msg 4 30K - 4 2048,4096,8192,16384 sem 4 106K - 4 2048,4096 shm 1 32K - 1 32768 tty 20 20K - 499 1024 pts 1 1K - 480 256 accf 2 1K - 2 64 mbuf_tag 0 0K - 291472282 32,64,128 shmfd 1 8K - 1 8192 soname 32 4K - 1210442 16,32,128 pcb 36 663K - 76872 16,32,64,128,1024,2048,8192 CAM CCB 0 0K - 182128 2048 acl 0 0K - 2 4096 vfscache 1 2048K - 1 =20 cl_savebuf 0 0K - 480 64 vfs_hash 1 1024K - 1 =20 vnodes 1 1K - 1 256 entropy 1026 65K - 49107 32,64,4096 mount 64 3K - 140 16,32,64,128,256 vnodemarker 0 0K - 4212 512 BPF 112 20504K - 131 16,64,128,512,4096 CAM path 11 1K - 63 32 ifnet 29 57K - 30 128,256,2048 ifaddr 315 105K - 315 32,64,128,256,512,2048,4096 ether_multi 232 13K - 282 16,32,64 clone 10 2K - 10 128 arpcom 23 1K - 23 16 gif 4 1K - 4 32,256 lltable 155 53K - 551 256,512 UART 6 5K - 6 16,1024 vlan 56 5K - 74 64,128 acpitask 1 16K - 1 16384 acpisem 110 14K - 110 128 raid_data 0 0K - 108 32,128,256 routetbl 516 136K - 101735 32,64,128,256,512 igmp 28 7K - 28 256 CARP 76 30K - 83 16,32,64,128,256,512,1024 ipid 2 24K - 2 8192,16384 in_mfilter 112 112K - 112 1024 in_multi 43 11K - 43 256 ip_moptions 224 35K - 224 64,256 CAM periph 7 2K - 19 16,32,64,128,256 acpidev 128 8K - 128 64 CAM queue 15 5K - 39 16,32,512 encap_export_host 4 4K - 4 1024 sctp_a_it 0 0K - 36 16 sctp_vrf 1 1K - 1 64 sctp_ifa 115 15K - 204 128 sctp_ifn 21 3K - 23 128 sctp_iter 0 0K - 36 256 hostcache 1 32K - 1 32768 syncache 1 64K - 1 65536 in6_mfilter 1 1K - 1 1024 in6_multi 15 2K - 15 32,256 ip6_moptions 2 1K - 2 32,256 CAM dev queue 6 1K - 6 64 kbdmux 6 22K - 6 16,512,1024,2048,16384 mld 26 4K - 26 128 LED 20 2K - 20 16,128 inpcbpolicy 365 12K - 119277 32 secasvar 7 2K - 214 256 sahead 10 3K - 10 256 ipsecpolicy 748 187K - 241562 256 ipsecrequest 18 3K - 72 128 ipsec-misc 56 2K - 1712 16,32,64 ipsec-saq 0 0K - 24 128 ipsec-reg 3 1K - 3 32 pfsync 2 2K - 893 32,256,1024 pf_temp 0 0K - 78 128 pf_hash 3 2880K - 3 =20 pf_ifnet 36 11K - 9510 256,2048 pf_tag 7 1K - 7 128 pf_altq 5 2K - 125 256 pf_rule 964 904K - 17500 128,1024 pf_osfp 1130 115K - 28250 64,128 pf_table 49 98K - 948 2048 crypto 37 11K - 1072 64,128,256,512,1024 xform 7 1K - 1530156 16,32,64,128,256 rpc 12 20K - 304 64,128,512,1024,8192 audit_evclass 187 6K - 231 32 ufs_dirhash 93 18K - 93 16,32,64,128,256,512 ufs_quota 1 1024K - 1 =20 ufs_mount 3 13K - 3 512,4096,8192 vm_pgdata 2 513K - 2 128 UMAHash 5 6K - 10 512,1024,2048 CAM SIM 6 2K - 6 256 CAM XPT 30 3K - 1850 = 16,32,64,128,256,512,1024,2048,65536 CAM DEV 9 18K - 16 2048 fpukern_ctx 3 6K - 3 2048 memdesc 1 4K - 1 4096 USB 23 33K - 24 = 16,128,256,512,1024,2048,4096 DEVFS3 136 34K - 2027 256 DEVFS1 108 54K - 594 512 apmdev 1 1K - 1 128 madt_table 0 0K - 1 4096 DEVFS_RULE 55 26K - 55 64,512 DEVFS 12 1K - 13 16,128 DEVFSP 22 2K - 167 64 io_apic 1 2K - 1 2048 isadev 8 1K - 8 128 MCA 15 2K - 15 32,128 msi 30 4K - 30 128 nexusdev 5 1K - 5 16 USBdev 21 8K - 21 32,64,128,256,512,1024,4096 NFSD V4client 1 1K - 1 256 cdev 5 2K - 5 256 cxgbe 41 956K - 44 = 128,256,512,1024,2048,4096,8192,16384 ipmi 0 0K - 20155 128,2048 htcp data 127 4K - 13675 32 aesni_data 3 3K - 3 1024 solaris 142 12302K - 3189 16,32,64,128,512,1024,8192 kstat_data 6 1K - 6 64 TCP States: https://i.stack.imgur.com/G7850.png -- =09 Josh Gitlin Senior Full Stack Developer (415) 690-1610 x155 Stay up to date and join the conversation in Relay = .