Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Sep 2017 16:32:12 -0400
From:      Josh Gitlin <jgitlin@goboomtown.com>
To:        Alan Somers <asomers@freebsd.org>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: Help with mbuf exhaustion
Message-ID:  <507664F2-8215-4D8B-B474-EA2E8B46D1AD@goboomtown.com>
In-Reply-To: <CAOtMX2j7k7GLO2hm-QNJ9yef1V5WMP9SVbQs0p%2Bg7RJOabg-5w@mail.gmail.com>
References:  <322F6F4B-1153-4ECE-B854-B2981B0CDDF2@goboomtown.com> <CAOtMX2j7k7GLO2hm-QNJ9yef1V5WMP9SVbQs0p%2Bg7RJOabg-5w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
My mistake, the "1" was cut off from my message. We are actually on =
FreeBSD 10.3-RELEASE-p21, _not_ p2

--
 <http://www.goboomtown.com/>=09
Josh Gitlin
Senior Full Stack Developer
(415) 690-1610 x155

Stay up to date and join the conversation in Relay =
<http://relay.goboomtown.com/>.

> On Sep 28, 2017, at 4:30 PM, Alan Somers <asomers@freebsd.org> wrote:
>=20
> First of all, 10.3-RELEASE-p2 is very old and has known security
> vulnerabilities.  Have you tried 10.3-RELEASE-p21 or even 10.4-RELEASE
> ?
>=20
> On Thu, Sep 28, 2017 at 1:30 PM, Josh Gitlin <jgitlin@goboomtown.com =
<mailto:jgitlin@goboomtown.com>> wrote:
>> Hi FreeBSD Gurus!
>>=20
>> We're having an issue with mbuf exhaustion on a FreeBSD server which =
was recently upgraded from 10.3-STABLE to 10.3-RELEASE-p2. Under the =
course of normal operation, we see mbuf usage steadily increasing until =
we reach kern.ipc.nmbufs limit, at which point the machine becomes =
unresponsive over the network (due to lack of mbufs for network access) =
and the console displays:
>>=20
>> cxl0: Interface stopped DISTRIBUTING, possible flapping
>> cxl1: Interface stopped DISTRIBUTING, possible flapping
>> [zone: mbuf] kern.ipc.nmbufs limit reached
>> [zone: mbuf] kern.ipc.nmbufs limit reached
>> The machine runs pf and acts as a packet filter, router, gateway and =
DHCP/DNS server. It has two Chelsio NICs in it, and is a CARP master =
with a secondary. The secondary has identical configuration of hardware =
and software and does not exhibit this issue.
>>=20
>> Given the downtime this causes, we set up our Nagios/Check_MK to =
graph the output of `netstat -m` and alert when mbufs in use approaches =
`kern.ipc.nmbufs` and we see a steady linear increase in mbuf usage =
until we reboot:
>>=20
>> https://i.stack.imgur.com/8bzAq.png =
<https://i.stack.imgur.com/8bzAq.png>; =
<https://i.stack.imgur.com/8bzAq.png =
<https://i.stack.imgur.com/8bzAq.png>>;
>>=20
>> mbuf *clusters* in use does not change when this happens and =
increasing mbuf cluster limits has no effect:
>>=20
>> https://i.stack.imgur.com/7OzdN.png =
<https://i.stack.imgur.com/7OzdN.png>; =
<https://i.stack.imgur.com/7OzdN.png =
<https://i.stack.imgur.com/7OzdN.png>>;
>>=20
>> This appears to be a kernel bug of some sort to me, looking for =
advice on further troubleshooting or assistance in resolving this!
>>=20
>> Helpful (maybe) information:
>>=20
>> netstat -m:
>>=20
>> 679270/3080/682350 mbufs in use (current/cache/total)
>> 10243/1657/11900/985360 mbuf clusters in use =
(current/cache/total/max)
>> 10243/1648 mbuf+clusters out of packet secondary zone in use =
(current/cache)
>> 8128/482/8610/124025 4k (page size) jumbo clusters in use =
(current/cache/total/max)
>> 0/0/0/36748 9k jumbo clusters in use (current/cache/total/max)
>> 128/0/128/20670 16k jumbo clusters in use (current/cache/total/max)
>> 224863K/6012K/230875K bytes allocated to network =
(current/cache/total)
>> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>> 0 requests for sfbufs denied
>> 0 requests for sfbufs delayed
>> 0 requests for I/O initiated by sendfile
>>=20
>> vmstat -z|grep -E '^ITEM|mbuf':
>>=20
>> ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL =
SLEEP
>> mbuf_packet:            256, 1587540,   10239,    1652,84058893,   0, =
  0
>> mbuf:                   256, 1587540,  671533,    1206,914478880,   =
0,   0
>> mbuf_cluster:          2048, 985360,   11891,       9,   11891,   0,  =
 0
>> mbuf_jumbo_page:       4096, 124025,    8128,     512,15011847,   0,  =
 0
>> mbuf_jumbo_9k:         9216,  36748,       0,       0,       0,   0,  =
 0
>> mbuf_jumbo_16k:       16384,  20670,     128,       0,     128,   0,  =
 0
>> mbuf_ext_refcnt:          4,      0,       0,       0,       0,   0,  =
 0
>>=20
>> vmstat -m:
>>=20
>>         Type InUse MemUse HighUse Requests  Size(s)
>> NFSD lckfile     1     1K       -        1  256
>>     filedesc   103   383K       -  1134731  =
16,32,128,2048,4096,8192,16384,65536
>>        sigio     1     1K       -        1  64
>>     filecaps     0     0K       -      973  64
>>      kdtrace   292    59K       -  1099386  64,256
>>         kenv   121    13K       -      125  16,32,64,128,8192
>>       kqueue    14    22K       -     5374  256,2048,8192
>>    proc-args    54     5K       -   578448  16,32,64,128,256
>>        hhook     2     1K       -        2  256
>>      ithread   146    24K       -      146  32,128,256
>>       KTRACE   100    13K       -      100  128
>>       NFS fh     1     1K       -      584  32
>>       linker   207  1052K       -      234  =
16,32,64,128,256,512,1024,2048,4096,8192,16384,65536
>>        lockf    29     3K       -    20042  64,128
>>   loginclass     2     1K       -     1192  64
>>       devbuf 17205 36362K       -    17523  =
16,32,64,128,256,512,1024,2048,4096,8192,65536
>>         temp   149    51K       -  1280113  =
16,32,64,128,256,512,1024,2048,4096,8192,16384,65536
>>       ip6opt     5     2K       -        6  256
>>       ip6ndp    27     2K       -       27  64,128
>>       module   230    29K       -      230  128
>>     mtx_pool     2    16K       -        2  8192
>>          osd     3     1K       -        5  16,32,64
>>     pmchooks     1     1K       -        1  128
>>         pgrp    30     4K       -     2222  128
>>      session    29     4K       -     2187  128
>>         proc     2    32K       -        2  16384
>>      subproc   211   368K       -  1099014  512,4096
>>         cred   204    32K       -  6025704  64,256
>>       plimit    19     5K       -     3985  256
>>      uidinfo     9     5K       -    11892  128,4096
>> NFSD session     1     1K       -        1  1024
>>       sysctl     0     0K       -    63851  16,32,64
>>    sysctloid  7196   365K       -     7369  16,32,64,128
>>    sysctltmp     0     0K       -    17834  16,32,64,128
>>      tidhash     1    32K       -        1  32768
>>      callout     5  2184K       -        5
>>         umtx   522    66K       -      522  128
>>     p1003.1b     1     1K       -        1  16
>>         SWAP     2   549K       -        2  64
>>          bus   802    86K       -     6536  16,32,64,128,256,1024
>>       bus-sc    57  1671K       -     2431  =
16,32,64,128,256,512,1024,2048,4096,8192,16384,65536
>>    newnfsmnt     1     1K       -        1  1024
>>      devstat     8    17K       -        8  32,4096
>> eventhandler   116    10K       -      116  64,128
>>         kobj   124   496K       -      296  4096
>>     acpiintr     1     1K       -        1  64
>>      Per-cpu     1     1K       -        1  32
>>       acpica 14355  1420K       -   216546  =
16,32,64,128,256,512,1024,2048,4096
>>     pci_link    16     2K       -       16  64,128
>>    pfs_nodes    21     6K       -       21  256
>>         rman   316    37K       -      716  16,32,128
>>         sbuf     1     1K       -    41375  =
16,32,64,128,256,512,1024,2048,4096,8192,16384
>>       sglist     8     8K       -        8  1024
>>         GEOM    88    15K       -     1871  =
16,32,64,128,256,512,1024,2048,8192,16384
>>      acpipwr     5     1K       -        5  64
>>    taskqueue    43     7K       -       43  16,32,256
>>       Unitno    22     2K       -  1208250  32,64
>>         vmem     3   144K       -        6  1024,4096,8192
>>     ioctlops     0     0K       -   185700  256,512,1024,2048,4096
>>       select    89    12K       -       89  128
>>          iov     0     0K       - 19808992  16,64,128,256,512,1024
>>          msg     4    30K       -        4  2048,4096,8192,16384
>>          sem     4   106K       -        4  2048,4096
>>          shm     1    32K       -        1  32768
>>          tty    20    20K       -      499  1024
>>          pts     1     1K       -      480  256
>>         accf     2     1K       -        2  64
>>     mbuf_tag     0     0K       - 291472282  32,64,128
>>        shmfd     1     8K       -        1  8192
>>       soname    32     4K       -  1210442  16,32,128
>>          pcb    36   663K       -    76872  =
16,32,64,128,1024,2048,8192
>>      CAM CCB     0     0K       -   182128  2048
>>          acl     0     0K       -        2  4096
>>     vfscache     1  2048K       -        1
>>   cl_savebuf     0     0K       -      480  64
>>     vfs_hash     1  1024K       -        1
>>       vnodes     1     1K       -        1  256
>>      entropy  1026    65K       -    49107  32,64,4096
>>        mount    64     3K       -      140  16,32,64,128,256
>>  vnodemarker     0     0K       -     4212  512
>>          BPF   112 20504K       -      131  16,64,128,512,4096
>>     CAM path    11     1K       -       63  32
>>        ifnet    29    57K       -       30  128,256,2048
>>       ifaddr   315   105K       -      315  =
32,64,128,256,512,2048,4096
>>  ether_multi   232    13K       -      282  16,32,64
>>        clone    10     2K       -       10  128
>>       arpcom    23     1K       -       23  16
>>          gif     4     1K       -        4  32,256
>>      lltable   155    53K       -      551  256,512
>>         UART     6     5K       -        6  16,1024
>>         vlan    56     5K       -       74  64,128
>>     acpitask     1    16K       -        1  16384
>>      acpisem   110    14K       -      110  128
>>    raid_data     0     0K       -      108  32,128,256
>>     routetbl   516   136K       -   101735  32,64,128,256,512
>>         igmp    28     7K       -       28  256
>>         CARP    76    30K       -       83  16,32,64,128,256,512,1024
>>         ipid     2    24K       -        2  8192,16384
>>   in_mfilter   112   112K       -      112  1024
>>     in_multi    43    11K       -       43  256
>>  ip_moptions   224    35K       -      224  64,256
>>   CAM periph     7     2K       -       19  16,32,64,128,256
>>      acpidev   128     8K       -      128  64
>>    CAM queue    15     5K       -       39  16,32,512
>> encap_export_host     4     4K       -        4  1024
>>    sctp_a_it     0     0K       -       36  16
>>     sctp_vrf     1     1K       -        1  64
>>     sctp_ifa   115    15K       -      204  128
>>     sctp_ifn    21     3K       -       23  128
>>    sctp_iter     0     0K       -       36  256
>>    hostcache     1    32K       -        1  32768
>>     syncache     1    64K       -        1  65536
>>  in6_mfilter     1     1K       -        1  1024
>>    in6_multi    15     2K       -       15  32,256
>> ip6_moptions     2     1K       -        2  32,256
>> CAM dev queue     6     1K       -        6  64
>>       kbdmux     6    22K       -        6  16,512,1024,2048,16384
>>          mld    26     4K       -       26  128
>>          LED    20     2K       -       20  16,128
>>  inpcbpolicy   365    12K       -   119277  32
>>     secasvar     7     2K       -      214  256
>>       sahead    10     3K       -       10  256
>>  ipsecpolicy   748   187K       -   241562  256
>> ipsecrequest    18     3K       -       72  128
>>   ipsec-misc    56     2K       -     1712  16,32,64
>>    ipsec-saq     0     0K       -       24  128
>>    ipsec-reg     3     1K       -        3  32
>>       pfsync     2     2K       -      893  32,256,1024
>>      pf_temp     0     0K       -       78  128
>>      pf_hash     3  2880K       -        3
>>     pf_ifnet    36    11K       -     9510  256,2048
>>       pf_tag     7     1K       -        7  128
>>      pf_altq     5     2K       -      125  256
>>      pf_rule   964   904K       -    17500  128,1024
>>      pf_osfp  1130   115K       -    28250  64,128
>>     pf_table    49    98K       -      948  2048
>>       crypto    37    11K       -     1072  64,128,256,512,1024
>>        xform     7     1K       -  1530156  16,32,64,128,256
>>          rpc    12    20K       -      304  64,128,512,1024,8192
>> audit_evclass   187     6K       -      231  32
>>  ufs_dirhash    93    18K       -       93  16,32,64,128,256,512
>>    ufs_quota     1  1024K       -        1
>>    ufs_mount     3    13K       -        3  512,4096,8192
>>    vm_pgdata     2   513K       -        2  128
>>      UMAHash     5     6K       -       10  512,1024,2048
>>      CAM SIM     6     2K       -        6  256
>>      CAM XPT    30     3K       -     1850  =
16,32,64,128,256,512,1024,2048,65536
>>      CAM DEV     9    18K       -       16  2048
>>  fpukern_ctx     3     6K       -        3  2048
>>      memdesc     1     4K       -        1  4096
>>          USB    23    33K       -       24  =
16,128,256,512,1024,2048,4096
>>       DEVFS3   136    34K       -     2027  256
>>       DEVFS1   108    54K       -      594  512
>>       apmdev     1     1K       -        1  128
>>   madt_table     0     0K       -        1  4096
>>   DEVFS_RULE    55    26K       -       55  64,512
>>        DEVFS    12     1K       -       13  16,128
>>       DEVFSP    22     2K       -      167  64
>>      io_apic     1     2K       -        1  2048
>>       isadev     8     1K       -        8  128
>>          MCA    15     2K       -       15  32,128
>>          msi    30     4K       -       30  128
>>     nexusdev     5     1K       -        5  16
>>       USBdev    21     8K       -       21  =
32,64,128,256,512,1024,4096
>> NFSD V4client     1     1K       -        1  256
>>         cdev     5     2K       -        5  256
>>        cxgbe    41   956K       -       44  =
128,256,512,1024,2048,4096,8192,16384
>>         ipmi     0     0K       -    20155  128,2048
>>    htcp data   127     4K       -    13675  32
>>   aesni_data     3     3K       -        3  1024
>>      solaris   142 12302K       -     3189  =
16,32,64,128,512,1024,8192
>>   kstat_data     6     1K       -        6  64
>>=20
>> TCP States:
>>=20
>> https://i.stack.imgur.com/G7850.png =
<https://i.stack.imgur.com/G7850.png>;
>>=20
>>=20
>> --
>> <http://www.goboomtown.com/ <http://www.goboomtown.com/>>;
>> Josh Gitlin
>> Senior Full Stack Developer
>> (415) 690-1610 x155
>>=20
>> Stay up to date and join the conversation in Relay =
<http://relay.goboomtown.com/ <http://relay.goboomtown.com/>>.
>>=20
>> _______________________________________________
>> freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org> mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net =
<https://lists.freebsd.org/mailman/listinfo/freebsd-net>;
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org =
<mailto:freebsd-net-unsubscribe@freebsd.org>"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?507664F2-8215-4D8B-B474-EA2E8B46D1AD>