From owner-freebsd-net@freebsd.org  Sun Jun 25 14:54:31 2017
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4F3D1D92E64;
 Sun, 25 Jun 2017 14:54:31 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wr0-x233.google.com (mail-wr0-x233.google.com
 [IPv6:2a00:1450:400c:c0c::233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id CE98E70DFE;
 Sun, 25 Jun 2017 14:54:30 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wr0-x233.google.com with SMTP id k67so121281522wrc.2;
 Sun, 25 Jun 2017 07:54:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=EHRT7r60qlKUbl2PrY0ILdGA/eiVC6MuO//XFBnYNFY=;
 b=nhRCcdYd0bXzBbNuvLETpqnQp++k4jsFSBahXn7hVe9bsivNQzXhVzvX7BGB5xl6yn
 v7vAxanVKqLWCp9w33AVHoBGrQG0O2rKYbzlRV2ZAWx67yNM5FbgcnDedLL5UzVZdDCO
 dEBkybR5l76jgjVnlPopui+OZYGTzhvwb0vKxtpzNaN880rastHlmnYFQ3jP0KC0zgcs
 BmZFhWaWE5rHXvps08qPO7JEWYLXQC4DM8NxheM7jg4DWPr+ADHyFnCrbUIJcQIQyIo3
 hqrFouKJAOYCUTruG4hL2pdbi7oPoafNilCBAf9r7Er5sOOTYFZv3tNHBwT2fF6N4HPC
 CEVQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=EHRT7r60qlKUbl2PrY0ILdGA/eiVC6MuO//XFBnYNFY=;
 b=ljQz57AVxJr3PVuAQNRbBwO0vuuUYMhAQALTyLVjaTtgT2WQEvX8vw/FuHWN07hU7t
 vPjB2jHD+1j6JjA4qCxGKOLlpUH8pzQEyzJ9oyPHEJlm6WWDx36Ng8FwNadSKeRSod+y
 QZGmZ5KyI66vczyaocQHMAeRLaAK8puBcISbrlXP2bJbUpAVrz1fBxprg/DKzpc5OKsR
 V512SHSvKivI3A7pfwrBb9Meg1KAWOOYeKSKEXKS8xPcF/LQbIGTvfbAxoCv6BzCykhD
 TkFAEBjc8x732+6Sdomcuy56xz0wueMYlt0PDcBs1Ur7PXSafgCbIkJmuQ0268SsOvvM
 3Bmw==
X-Gm-Message-State: AKS2vOzs4tB03/meVo+w3+uYrFdOH/5Es0Zlpxe9Pcf9KfI3jJSW/ao2
 Y7x97tlyyx30GLmABbs=
X-Received: by 10.223.144.39 with SMTP id h36mr11995373wrh.114.1498402467549; 
 Sun, 25 Jun 2017 07:54:27 -0700 (PDT)
Received: from ben.home (LFbn-1-7159-4.w90-116.abo.wanadoo.fr. [90.116.90.4])
 by smtp.gmail.com with ESMTPSA id
 m73sm10541797wmi.25.2017.06.25.07.54.26
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Sun, 25 Jun 2017 07:54:26 -0700 (PDT)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: mbuf_jumbo_9k & iSCSI failing
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com>
Date: Sun, 25 Jun 2017 16:54:25 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com>
References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com>
 <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org>
 <DB3PR05MB089A5789A0A619FA8B7CA36C36C0@DB3PR05MB089.eurprd05.prod.outlook.com>
 <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com>
 <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com>
 <DB3PR05MB089011A41EF87A40C7AC741C36E0@DB3PR05MB089.eurprd05.prod.outlook.com>
 <F19B51C7-7DDD-4FAB-9091-0B7C8A7CE649@gmail.com>
 <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com>
To: FreeBSD Net <freebsd-net@freebsd.org>,
 freebsd-scsi@freebsd.org
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Jun 2017 14:54:31 -0000

> On 30 Dec 2016, at 22:55, Ben RUBSON <ben.rubson@gmail.com> wrote:
>=20
> Hello,
>=20
> 2 FreeBSD 11.0-p3 servers, one iSCSI initiator, one target.
> Both with Mellanox ConnectX-3 40G.
>=20
> Since a few days, sometimes, under undetermined circumstances, as soon =
as there is some (very low) iSCSI traffic, some of the disks get =
disconnected :
> kernel: WARNING: 192.168.2.2 (iqn......): no ping reply (NOP-Out) =
after 5 seconds; dropping connection
>=20
> At the same moment, sysctl counters hw.mlxen1.stat.rx_ring*.error grow =
on initiator side.
>=20
> I then tried to reproduce these network errors burning the link at 40G =
full-duplex using iPerf.
> But I did not manage to increase these error counters.
>=20
> It's strange because it's a sporadic issue, I can have traffic on =
iSCSI disks without any issue, and sometimes, they get disconnected with =
errors growing.

> On 01 Jan 2017, at 09:16, Meny Yossefi <menyy@mellanox.com> wrote:
>=20
> Any chance you ran out of mbufs in the system?

> On 02 Jan 2017, at 12:09, Ben RUBSON <ben.rubson@gmail.com> wrote:
>=20
> I think you are right, this could be a mbufs issue.
> Here are some more numbers :
>=20
> # vmstat -z | grep -v "0,   0$"
> ITEM                   SIZE   LIMIT     USED     FREE         REQ      =
FAIL SLEEP
> 4 Bucket:                32,      0,    2673,   28327,   88449799,    =
17317, 0
> 8 Bucket:                64,      0,     449,   15609,   13926386,     =
4871, 0
> 12 Bucket:               96,      0,     335,    5323,   10293892,   =
142872, 0
> 16 Bucket:              128,      0,     533,    6070,    7618615,   =
472647, 0
> 32 Bucket:              256,      0,    8317,   22133,   36020376,   =
563479, 0
> 64 Bucket:              512,      0,    1238,    3298,   20138111, =
11430742, 0
> 128 Bucket:            1024,      0,    1865,    2963,   21162182,   =
158752, 0
> 256 Bucket:            2048,      0,    1626,     450,   80253784,  =
4890164, 0
> mbuf_jumbo_9k:         9216, 603712,   16400,    8744, 4128521064,     =
2661, 0

> On 03 Jan 2017, at 07:27, Meny Yossefi <menyy@mellanox.com> wrote:
>=20
> Have you tried increasing the mbufs limit?=20
> (sysctl) kern.ipc.nmbufs (Maximum number of mbufs allowed)

> On 04 Jan 2017, at 14:47, Ben RUBSON <ben.rubson@gmail.com> wrote:
>=20
> No I did not try this yet.
> However, from the numbers above (and below), I think I should increase =
kern.ipc.nmbjumbo9 instead ?

> On 30 Jan 2017, at 15:36, Ben RUBSON <ben.rubson@gmail.com> wrote:
>=20
> So, to give some news, increasing kern.ipc.nmbjumbo9 helped a lot.
> Just a very little issue (compared to the others before) over the last =
3 weeks.



Hello,

I'm back today with this issue.
Above is my discussion with Meny from Mellanox at the beginning of 2017.
(topic was "iSCSI failing, MLX rx_ring errors ?", on freebsd-net list)

So this morning issue came again, some of my iSCSI disks were =
disconnected.
Below are some numbers.



# vmstat -z | grep -v "0,   0$"
ITEM           SIZE    LIMIT     USED     FREE         REQ     FAIL =
SLEEP
8 Bucket:        64,       0,     654,    8522,   28604967,      11, 0
12 Bucket:       96,       0,     976,    5092,   23758734,      78, 0
32 Bucket:      256,       0,     789,    4491,   43446969,     137, 0
64 Bucket:      512,       0,     666,    2750,   47568959, 1272018, 0
128 Bucket:    1024,       0,    1047,    1249,   28774042,  232504, 0
256 Bucket:    2048,       0,    1611,     369,  139988097, 8931139, 0
vmem btag:       56,       0, 2949738,   15506,   18092235,   20908, 0
mbuf_jumbo_9k: 9216, 2037529,   16400,    8776, 8610737115,     297, 0

# uname -rs
FreeBSD 11.0-RELEASE-p8

# uptime
 3:34p.m.  up 88 days, 15:57, 2 users, load averages: 0.95, 0.67, 0.62

# grep kern.ipc.nmb /boot/loader.conf=20
kern.ipc.nmbjumbo9=3D2037529
kern.ipc.nmbjumbo16=3D1

# sysctl kern.ipc | grep mb
kern.ipc.nmbufs: 26080380
kern.ipc.nmbjumbo16: 4
kern.ipc.nmbjumbo9: 6112587
kern.ipc.nmbjumbop: 2037529
kern.ipc.nmbclusters: 4075060
kern.ipc.maxmbufmem: 33382887424

# ifconfig mlxen1
mlxen1: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 =
mtu 9020
=
options=3Ded07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCS=
UM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV=
6>
nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (40Gbase-CR4 <full-duplex,rxpause,txpause>)
status: active



I just caught the issue growing :

# vmstat -z | grep mbuf_jumbo_9k
ITEM           SIZE  LIMIT     USED  FREE        REQ FAIL SLEEP
mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735246407, 665, 0
mbuf_jumbo_9k: 9216, 2037529, 16411, 7320,8735286748, 665, 0
mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735298937, 667, 0
mbuf_jumbo_9k: 9216, 2037529, 16438, 7293,8735337634, 667, 0
mbuf_jumbo_9k: 9216, 2037529, 16407, 7324,8735354339, 668, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735382105, 669, 0
mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735392836, 671, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735423910, 671, 0
mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735456393, 671, 0
mbuf_jumbo_9k: 9216, 2037529, 16409, 7322,8735472284, 672, 0
mbuf_jumbo_9k: 9216, 2037529, 16420, 7311,8735512237, 673, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735518502, 675, 0
mbuf_jumbo_9k: 9216, 2037529, 16410, 7321,8735543668, 676, 0
mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8735555646, 678, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735568986, 679, 0
mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8735579075, 680, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735603983, 681, 0
mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735634273, 681, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735646057, 683, 0
mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735658213, 684, 0
mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8735675678, 686, 0
mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735686017, 687, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735707335, 687, 0
mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8736016546, 708, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736037292, 709, 0
mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8736053865, 710, 0
mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8736070103, 711, 0
mbuf_jumbo_9k: 9216, 2037529, 16407, 7324,8736086810, 711, 0
mbuf_jumbo_9k: 9216, 2037529, 16430, 7301,8736098568, 713, 0
mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8736122803, 714, 0
mbuf_jumbo_9k: 9216, 2037529, 16417, 7314,8736134322, 715, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736152338, 715, 0
mbuf_jumbo_9k: 9216, 2037529, 16403, 7328,8736167677, 715, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736170783, 717, 0
mbuf_jumbo_9k: 9216, 2037529, 16445, 7286,8736546084, 733, 0

During this, top was reporting the following :
Mem: 4056K Active, 426M Inact, 59G Wired, 2531M Free

And in /var/log/messages :
kernel: WARNING: 192.168.2.2 (iqn......): no ping reply (NOP-Out) after =
5 seconds; dropping connection



Any idea why I'm experiencing this ?

Thank you very much for your help & support,

Best regards,

Ben