Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Sep 2017 15:30:08 -0400
From:      Josh Gitlin <jgitlin@goboomtown.com>
To:        freebsd-net@freebsd.org
Subject:   Help with mbuf exhaustion
Message-ID:  <322F6F4B-1153-4ECE-B854-B2981B0CDDF2@goboomtown.com>

next in thread | raw e-mail | index | archive | help
Hi FreeBSD Gurus!

We're having an issue with mbuf exhaustion on a FreeBSD server which was =
recently upgraded from 10.3-STABLE to 10.3-RELEASE-p2. Under the course =
of normal operation, we see mbuf usage steadily increasing until we =
reach kern.ipc.nmbufs limit, at which point the machine becomes =
unresponsive over the network (due to lack of mbufs for network access) =
and the console displays:

cxl0: Interface stopped DISTRIBUTING, possible flapping
cxl1: Interface stopped DISTRIBUTING, possible flapping
[zone: mbuf] kern.ipc.nmbufs limit reached
[zone: mbuf] kern.ipc.nmbufs limit reached
The machine runs pf and acts as a packet filter, router, gateway and =
DHCP/DNS server. It has two Chelsio NICs in it, and is a CARP master =
with a secondary. The secondary has identical configuration of hardware =
and software and does not exhibit this issue.

Given the downtime this causes, we set up our Nagios/Check_MK to graph =
the output of `netstat -m` and alert when mbufs in use approaches =
`kern.ipc.nmbufs` and we see a steady linear increase in mbuf usage =
until we reboot:

https://i.stack.imgur.com/8bzAq.png =
<https://i.stack.imgur.com/8bzAq.png>;

mbuf *clusters* in use does not change when this happens and increasing =
mbuf cluster limits has no effect:

https://i.stack.imgur.com/7OzdN.png =
<https://i.stack.imgur.com/7OzdN.png>;

This appears to be a kernel bug of some sort to me, looking for advice =
on further troubleshooting or assistance in resolving this!

Helpful (maybe) information:

netstat -m:

679270/3080/682350 mbufs in use (current/cache/total)
10243/1657/11900/985360 mbuf clusters in use (current/cache/total/max)
10243/1648 mbuf+clusters out of packet secondary zone in use =
(current/cache)
8128/482/8610/124025 4k (page size) jumbo clusters in use =
(current/cache/total/max)
0/0/0/36748 9k jumbo clusters in use (current/cache/total/max)
128/0/128/20670 16k jumbo clusters in use (current/cache/total/max)
224863K/6012K/230875K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

vmstat -z|grep -E '^ITEM|mbuf':

ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
mbuf_packet:            256, 1587540,   10239,    1652,84058893,   0,   =
0
mbuf:                   256, 1587540,  671533,    1206,914478880,   0,   =
0
mbuf_cluster:          2048, 985360,   11891,       9,   11891,   0,   0
mbuf_jumbo_page:       4096, 124025,    8128,     512,15011847,   0,   0
mbuf_jumbo_9k:         9216,  36748,       0,       0,       0,   0,   0
mbuf_jumbo_16k:       16384,  20670,     128,       0,     128,   0,   0
mbuf_ext_refcnt:          4,      0,       0,       0,       0,   0,   0

vmstat -m:

         Type InUse MemUse HighUse Requests  Size(s)
 NFSD lckfile     1     1K       -        1  256
     filedesc   103   383K       -  1134731  =
16,32,128,2048,4096,8192,16384,65536
        sigio     1     1K       -        1  64
     filecaps     0     0K       -      973  64
      kdtrace   292    59K       -  1099386  64,256
         kenv   121    13K       -      125  16,32,64,128,8192
       kqueue    14    22K       -     5374  256,2048,8192
    proc-args    54     5K       -   578448  16,32,64,128,256
        hhook     2     1K       -        2  256
      ithread   146    24K       -      146  32,128,256
       KTRACE   100    13K       -      100  128
       NFS fh     1     1K       -      584  32
       linker   207  1052K       -      234  =
16,32,64,128,256,512,1024,2048,4096,8192,16384,65536
        lockf    29     3K       -    20042  64,128
   loginclass     2     1K       -     1192  64
       devbuf 17205 36362K       -    17523  =
16,32,64,128,256,512,1024,2048,4096,8192,65536
         temp   149    51K       -  1280113  =
16,32,64,128,256,512,1024,2048,4096,8192,16384,65536
       ip6opt     5     2K       -        6  256
       ip6ndp    27     2K       -       27  64,128
       module   230    29K       -      230  128
     mtx_pool     2    16K       -        2  8192
          osd     3     1K       -        5  16,32,64
     pmchooks     1     1K       -        1  128
         pgrp    30     4K       -     2222  128
      session    29     4K       -     2187  128
         proc     2    32K       -        2  16384
      subproc   211   368K       -  1099014  512,4096
         cred   204    32K       -  6025704  64,256
       plimit    19     5K       -     3985  256
      uidinfo     9     5K       -    11892  128,4096
 NFSD session     1     1K       -        1  1024
       sysctl     0     0K       -    63851  16,32,64
    sysctloid  7196   365K       -     7369  16,32,64,128
    sysctltmp     0     0K       -    17834  16,32,64,128
      tidhash     1    32K       -        1  32768
      callout     5  2184K       -        5 =20
         umtx   522    66K       -      522  128
     p1003.1b     1     1K       -        1  16
         SWAP     2   549K       -        2  64
          bus   802    86K       -     6536  16,32,64,128,256,1024
       bus-sc    57  1671K       -     2431  =
16,32,64,128,256,512,1024,2048,4096,8192,16384,65536
    newnfsmnt     1     1K       -        1  1024
      devstat     8    17K       -        8  32,4096
 eventhandler   116    10K       -      116  64,128
         kobj   124   496K       -      296  4096
     acpiintr     1     1K       -        1  64
      Per-cpu     1     1K       -        1  32
       acpica 14355  1420K       -   216546  =
16,32,64,128,256,512,1024,2048,4096
     pci_link    16     2K       -       16  64,128
    pfs_nodes    21     6K       -       21  256
         rman   316    37K       -      716  16,32,128
         sbuf     1     1K       -    41375  =
16,32,64,128,256,512,1024,2048,4096,8192,16384
       sglist     8     8K       -        8  1024
         GEOM    88    15K       -     1871  =
16,32,64,128,256,512,1024,2048,8192,16384
      acpipwr     5     1K       -        5  64
    taskqueue    43     7K       -       43  16,32,256
       Unitno    22     2K       -  1208250  32,64
         vmem     3   144K       -        6  1024,4096,8192
     ioctlops     0     0K       -   185700  256,512,1024,2048,4096
       select    89    12K       -       89  128
          iov     0     0K       - 19808992  16,64,128,256,512,1024
          msg     4    30K       -        4  2048,4096,8192,16384
          sem     4   106K       -        4  2048,4096
          shm     1    32K       -        1  32768
          tty    20    20K       -      499  1024
          pts     1     1K       -      480  256
         accf     2     1K       -        2  64
     mbuf_tag     0     0K       - 291472282  32,64,128
        shmfd     1     8K       -        1  8192
       soname    32     4K       -  1210442  16,32,128
          pcb    36   663K       -    76872  16,32,64,128,1024,2048,8192
      CAM CCB     0     0K       -   182128  2048
          acl     0     0K       -        2  4096
     vfscache     1  2048K       -        1 =20
   cl_savebuf     0     0K       -      480  64
     vfs_hash     1  1024K       -        1 =20
       vnodes     1     1K       -        1  256
      entropy  1026    65K       -    49107  32,64,4096
        mount    64     3K       -      140  16,32,64,128,256
  vnodemarker     0     0K       -     4212  512
          BPF   112 20504K       -      131  16,64,128,512,4096
     CAM path    11     1K       -       63  32
        ifnet    29    57K       -       30  128,256,2048
       ifaddr   315   105K       -      315  32,64,128,256,512,2048,4096
  ether_multi   232    13K       -      282  16,32,64
        clone    10     2K       -       10  128
       arpcom    23     1K       -       23  16
          gif     4     1K       -        4  32,256
      lltable   155    53K       -      551  256,512
         UART     6     5K       -        6  16,1024
         vlan    56     5K       -       74  64,128
     acpitask     1    16K       -        1  16384
      acpisem   110    14K       -      110  128
    raid_data     0     0K       -      108  32,128,256
     routetbl   516   136K       -   101735  32,64,128,256,512
         igmp    28     7K       -       28  256
         CARP    76    30K       -       83  16,32,64,128,256,512,1024
         ipid     2    24K       -        2  8192,16384
   in_mfilter   112   112K       -      112  1024
     in_multi    43    11K       -       43  256
  ip_moptions   224    35K       -      224  64,256
   CAM periph     7     2K       -       19  16,32,64,128,256
      acpidev   128     8K       -      128  64
    CAM queue    15     5K       -       39  16,32,512
encap_export_host     4     4K       -        4  1024
    sctp_a_it     0     0K       -       36  16
     sctp_vrf     1     1K       -        1  64
     sctp_ifa   115    15K       -      204  128
     sctp_ifn    21     3K       -       23  128
    sctp_iter     0     0K       -       36  256
    hostcache     1    32K       -        1  32768
     syncache     1    64K       -        1  65536
  in6_mfilter     1     1K       -        1  1024
    in6_multi    15     2K       -       15  32,256
 ip6_moptions     2     1K       -        2  32,256
CAM dev queue     6     1K       -        6  64
       kbdmux     6    22K       -        6  16,512,1024,2048,16384
          mld    26     4K       -       26  128
          LED    20     2K       -       20  16,128
  inpcbpolicy   365    12K       -   119277  32
     secasvar     7     2K       -      214  256
       sahead    10     3K       -       10  256
  ipsecpolicy   748   187K       -   241562  256
 ipsecrequest    18     3K       -       72  128
   ipsec-misc    56     2K       -     1712  16,32,64
    ipsec-saq     0     0K       -       24  128
    ipsec-reg     3     1K       -        3  32
       pfsync     2     2K       -      893  32,256,1024
      pf_temp     0     0K       -       78  128
      pf_hash     3  2880K       -        3 =20
     pf_ifnet    36    11K       -     9510  256,2048
       pf_tag     7     1K       -        7  128
      pf_altq     5     2K       -      125  256
      pf_rule   964   904K       -    17500  128,1024
      pf_osfp  1130   115K       -    28250  64,128
     pf_table    49    98K       -      948  2048
       crypto    37    11K       -     1072  64,128,256,512,1024
        xform     7     1K       -  1530156  16,32,64,128,256
          rpc    12    20K       -      304  64,128,512,1024,8192
audit_evclass   187     6K       -      231  32
  ufs_dirhash    93    18K       -       93  16,32,64,128,256,512
    ufs_quota     1  1024K       -        1 =20
    ufs_mount     3    13K       -        3  512,4096,8192
    vm_pgdata     2   513K       -        2  128
      UMAHash     5     6K       -       10  512,1024,2048
      CAM SIM     6     2K       -        6  256
      CAM XPT    30     3K       -     1850  =
16,32,64,128,256,512,1024,2048,65536
      CAM DEV     9    18K       -       16  2048
  fpukern_ctx     3     6K       -        3  2048
      memdesc     1     4K       -        1  4096
          USB    23    33K       -       24  =
16,128,256,512,1024,2048,4096
       DEVFS3   136    34K       -     2027  256
       DEVFS1   108    54K       -      594  512
       apmdev     1     1K       -        1  128
   madt_table     0     0K       -        1  4096
   DEVFS_RULE    55    26K       -       55  64,512
        DEVFS    12     1K       -       13  16,128
       DEVFSP    22     2K       -      167  64
      io_apic     1     2K       -        1  2048
       isadev     8     1K       -        8  128
          MCA    15     2K       -       15  32,128
          msi    30     4K       -       30  128
     nexusdev     5     1K       -        5  16
       USBdev    21     8K       -       21  32,64,128,256,512,1024,4096
NFSD V4client     1     1K       -        1  256
         cdev     5     2K       -        5  256
        cxgbe    41   956K       -       44  =
128,256,512,1024,2048,4096,8192,16384
         ipmi     0     0K       -    20155  128,2048
    htcp data   127     4K       -    13675  32
   aesni_data     3     3K       -        3  1024
      solaris   142 12302K       -     3189  16,32,64,128,512,1024,8192
   kstat_data     6     1K       -        6  64

TCP States:

https://i.stack.imgur.com/G7850.png


--
 <http://www.goboomtown.com/>=09
Josh Gitlin
Senior Full Stack Developer
(415) 690-1610 x155

Stay up to date and join the conversation in Relay =
<http://relay.goboomtown.com/>.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?322F6F4B-1153-4ECE-B854-B2981B0CDDF2>