From owner-freebsd-net@freebsd.org Tue Jan 3 16:03:47 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D2590C9D81E for ; Tue, 3 Jan 2017 16:03:47 +0000 (UTC) (envelope-from menyy@mellanox.com) Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on0068.outbound.protection.outlook.com [104.47.1.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4D781178A for ; Tue, 3 Jan 2017 16:03:46 +0000 (UTC) (envelope-from menyy@mellanox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=4UdNtZXsK7FErN6C2ylBEbC9Z7GIU6v40drTdLk+Hjk=; b=wIDSKy7/dfDjZ7eH86lSQSEqrQMbgqj+agXFsB1Z1dIQeXxpP9IPCT0VxH1+z7o/Jhc2RlWKVFGwpjW5hhEMat6fKBGVOxYIH5wjOD9c6X3/jaEDRk2BgF8Yq4dETDL5o+eupNfEYcQ2bWEWA5HzKgk2PumN2ft/hbtpgUIsKuI= Received: from DB3PR05MB089.eurprd05.prod.outlook.com (10.255.251.144) by AM5PR0501MB2579.eurprd05.prod.outlook.com (10.169.150.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.817.10; Tue, 3 Jan 2017 06:27:17 +0000 Received: from DB3PR05MB089.eurprd05.prod.outlook.com ([169.254.8.98]) by DB3PR05MB089.eurprd05.prod.outlook.com ([169.254.8.98]) with mapi id 15.01.0817.009; Tue, 3 Jan 2017 06:27:15 +0000 From: Meny Yossefi To: "freebsd-net@freebsd.org" , Ben RUBSON CC: Hans Petter Selasky , Yuval Bason Subject: FW: iSCSI failing, MLX rx_ring errors ? Thread-Topic: iSCSI failing, MLX rx_ring errors ? Thread-Index: AQHSYuf3l33fVbUhx06glS50sF92GaEjRofAgAHE94CAAAAagIABQeeg Date: Tue, 3 Jan 2017 06:27:15 +0000 Message-ID: References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> , <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> In-Reply-To: <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=menyy@mellanox.com; x-originating-ip: [193.47.165.251] x-microsoft-exchange-diagnostics: 1; AM5PR0501MB2579; 7:kJXi5EV6Hs2WdujbfWNOGnCh3aR713GQssvGHZSnYK4KsT2vnHA5l8GUgaMyUKsd8+63qedaz/pUFwv/kCHKaKx8CxoWWwmW508EYitjXHfAmGKwI3hRmkJ9w7Ov6EtN8ZTAolIGskgx8H+RUoyXAKDjZuzD89o77OpmsuQVaVRzD93TWvcaqWb0s8Y5uCGCTcMu7MzT2FzbfujqFtZBndeVCIZPMRNd+FN/hpRH8MQ1M1VOv+jz9clukPJftYCwfnzv0Vbtbnx896J5+KMxbnuR6EzccPw0Kr2n9ekWApmnpT1LuvZ3SGxfRo1p5XaIT5nYhRx0udE9C/EUOR/njXkkATIKLRA0m+l+cr61NQ9Urbh54blmJ/HcMwSY0ukJLxXyLrnceRWSFs5snJnMDWQNJR0tuR0/DQoJT9+oL7oEkDsTb1K2hrnqMXAOcihx3IrN0ua4Qx3UekDQVe10MA== x-forefront-antispam-report: SFV:SKI; SCL:-1SFV:NSPM; SFS:(10009020)(6009001)(7916002)(39410400002)(39850400002)(39860400002)(39840400002)(39450400003)(45984002)(199003)(2473002)(189002)(24454002)(377454003)(6506006)(5660300001)(107886002)(81156014)(229853002)(93886004)(189998001)(50986999)(86362001)(8936002)(54356999)(6436002)(8676002)(76176999)(2501003)(4001430100002)(9686002)(81166006)(102836003)(2906002)(97736004)(4326007)(6116002)(7696004)(106116001)(54906002)(99286003)(55016002)(68736007)(106356001)(101416001)(3660700001)(5001770100001)(2900100001)(305945005)(92566002)(25786008)(74316002)(38730400001)(39060400001)(3280700002)(105586002)(66066001)(7736002)(33656002)(122556002)(2950100002)(3846002)(48910200003)(586874002); DIR:OUT; SFP:1101; SCL:1; SRVR:AM5PR0501MB2579; H:DB3PR05MB089.eurprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; x-ms-office365-filtering-correlation-id: 77fbc7ae-c1ea-4d27-f279-08d433a18fa5 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001); SRVR:AM5PR0501MB2579; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863)(278428928389397)(75325880899374); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040375)(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046)(6055026)(6041248)(20161123564025)(20161123562025)(20161123555025)(20161123560025)(6072148); SRVR:AM5PR0501MB2579; BCL:0; PCL:0; RULEID:; SRVR:AM5PR0501MB2579; x-forefront-prvs: 01762B0D64 received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Jan 2017 06:27:15.6481 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0501MB2579 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jan 2017 16:03:47 -0000 ________________________________________ From: owner-freebsd-net@freebsd.orgOn Behalf OfBen RUBSON Sent: Monday, January 2, 2017 11:09:15 AM (UTC+00:00) Monrovia, Reykjavik To: freebsd-net@freebsd.org Cc: Meny Yossefi; Yuval Bason; Hans Petter Selasky Subject: Re: iSCSI failing, MLX rx_ring errors ? Hi Meny, Thank you very much for your feedback. I think you are right, this could be a mbufs issue. Here are some more numbers : # vmstat -z | grep -v "0, 0$" ITEM SIZE LIMIT USED FREE REQ FAIL= SLEEP 4 Bucket: 32, 0, 2673, 28327, 88449799, 17317= , 0 8 Bucket: 64, 0, 449, 15609, 13926386, 4871= , 0 12 Bucket: 96, 0, 335, 5323, 10293892, 142872= , 0 16 Bucket: 128, 0, 533, 6070, 7618615, 472647= , 0 32 Bucket: 256, 0, 8317, 22133, 36020376, 563479= , 0 64 Bucket: 512, 0, 1238, 3298, 20138111, 11430742= , 0 128 Bucket: 1024, 0, 1865, 2963, 21162182, 158752= , 0 256 Bucket: 2048, 0, 1626, 450, 80253784, 4890164= , 0 mbuf_jumbo_9k: 9216, 603712, 16400, 8744, 4128521064, 2661= , 0 # netstat -m 32801/18814/51615 mbufs in use (current/cache/total) 16400/9810/26210/4075058 mbuf clusters in use (current/cache/total/max) 16400/9659 mbuf+clusters out of packet secondary zone in use (current/cache= ) 0/8647/8647/2037529 4k (page size) jumbo clusters in use (current/cache/tot= al/max) 16400/8744/25144/603712 9k jumbo clusters in use (current/cache/total/max) 0/0/0/339588 16k jumbo clusters in use (current/cache/total/max) 188600K/13= 7607K/326207K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/2661/0 requests for jumbo clusters denied (4k/9k/16k) 0 sendfile syscalls 0 sendfile syscalls completed without I/O request 0 requests for I/O initiated by sendfile 0 pages read by sendfile as part of a request 0 pages were valid at time of a sendfile request 0 pages were requested for read ahead by applications 0 pages were read ahead by sendfile 0 times sendfile encountered an already busy page 0 requests for sfbufs denied 0 requests for sfbufs delayed I did not perform any mbufs tuning, numbers above are from FreeBSD itself. This server has 64GB of memory. It has a ZFS pool for which I limit ARC memory impact with : vfs.zfs.arc_max=3D64424509440 #60G The only thing I did is some TCP tuning to improve throughput over high-lat= ency long-distance private links : kern.ipc.maxsockbuf=3D7372800 net.inet.tcp.sendbuf_max=3D6553600 net.inet.tcp.recvbuf_max=3D6553600 net.inet.tcp.sendspace=3D65536 net.inet.tcp.recvspace=3D65536 net.inet.tcp.sendbuf_inc=3D65536 net.inet.tcp.recvbuf_inc=3D65536 net.inet.tcp.cc.algorithm=3Dhtcp Here are some graphs of memory & ARC usage when issue occurs. Crosshair (vertical red line) is at the timestamp where I get iSCSI disconn= ections. https://postimg.org/gallery/1kkekrc4e/ What is strange is that each time issue occurs there is around 1GB of free = memory. So FreeBSD should still be able to allocate some more mbufs ? Unfortunately I do not have graphs about mbufs. What should I ideally do ? >> Have you tried increasing the mbufs limit?=20 (sysctl) kern.ipc.nmbufs (Maximum number of mbufs allowed) Thank you again, Best regards, Ben > On 01 Jan 2017, at 09:16, Meny Yossefi wrote: > > Hi Ben, > > Those are not HW errors, note that: > > hw.mlxen1.stat.rx_dropped: 0 > hw.mlxen1.stat.rx_errors: 0 > > It seems to be triggered when you are failing to allocate a replacement b= uffer. > Any chance you ran out of mbufs in the system? > > en_rx.c: > > mlx4_en_process_rx_cq(): > > mb =3D mlx4_en_rx_mb(priv, rx_desc, mb_list, length); > if (!mb) { > ring->errors++; > goto next; > } > > mlx4_en_rx_mb() =E0 mlx4_en_complete_rx_desc(): > > /* Allocate a replacement page */ > if (mlx4_en_alloc_buf(priv, rx_desc, mb_list, nr)) > goto fail; > > -Meny _______________________________________________ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"