From owner-freebsd-scsi@freebsd.org Mon Jun 26 14:00:54 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 25C62D87FCC for ; Mon, 26 Jun 2017 14:00:54 +0000 (UTC) (envelope-from julien@perdition.city) Received: from relay-b02.edpnet.be (relay-b02.edpnet.be [212.71.1.222]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "edpnet.email", Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C425C7A893 for ; Mon, 26 Jun 2017 14:00:53 +0000 (UTC) (envelope-from julien@perdition.city) X-ASG-Debug-ID: 1498484698-0a7b8d16e45f9920001-NzfR5x Received: from mordor.lan (77.109.96.171.adsl.dyn.edpnet.net [77.109.96.171]) by relay-b02.edpnet.be with ESMTP id iQJ7RsBX2SeU6S2e (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 26 Jun 2017 15:44:59 +0200 (CEST) X-Barracuda-Envelope-From: julien@perdition.city X-Barracuda-Effective-Source-IP: 77.109.96.171.adsl.dyn.edpnet.net[77.109.96.171] X-Barracuda-Apparent-Source-IP: 77.109.96.171 Date: Mon, 26 Jun 2017 15:44:58 +0200 From: Julien Cigar To: "Andrey V. Elsukov" Cc: Ryan Stone , Ben RUBSON , FreeBSD Net , "freebsd-scsi@freebsd.org" Subject: Re: mbuf_jumbo_9k & iSCSI failing Message-ID: <20170626134458.GT43966@mordor.lan> X-ASG-Orig-Subj: Re: mbuf_jumbo_9k & iSCSI failing References: <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="TgCXP+xznsSrEyty" Content-Disposition: inline In-Reply-To: <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> User-Agent: Mutt/1.8.2 (2017-04-18) X-Barracuda-Connect: 77.109.96.171.adsl.dyn.edpnet.net[77.109.96.171] X-Barracuda-Start-Time: 1498484698 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://212.71.1.222:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at edpnet.be X-Barracuda-Scan-Msg-Size: 1814 X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.4999 1.0000 0.0000 X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=6.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.40248 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 14:00:54 -0000 --TgCXP+xznsSrEyty Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jun 26, 2017 at 04:13:33PM +0300, Andrey V. Elsukov wrote: > On 25.06.2017 18:32, Ryan Stone wrote: > > Having looking at the original email more closely, I see that you showe= d an > > mlxen interface with a 9020 MTU. Seeing allocation failures of 9k mbuf > > clusters increase while you are far below the zone's limit means that > > you're definitely running into the bug I'm describing, and this bug cou= ld > > plausibly cause the iSCSI errors that you describe. > >=20 > > The issue is that the newer version of the driver tries to allocate a > > single buffer to accommodate an MTU-sized packet. Over time, however, > > memory will become fragmented and eventually it can become impossible to > > allocate a 9k physically contiguous buffer. When this happens the driv= er > > is unable to allocate buffers to receive packets and is forced to drop > > them. Presumably, if iSCSI suffers too many packet drops it will termi= nate > > the connection. The older version of the driver limited itself to > > page-sized buffers, so it was immune to issues with memory fragmentatio= n. >=20 > I think it is not mlxen specific problem, we have the same symptoms with > ixgbe(4) driver too. To avoid the problem we have patches that are > disable using of 9k mbufs, and instead only use 4k mbufs. I had the same issue on a lightly loaded HP DL20 machine (BCM5720=20 chipsets), 8GB of RAM, running 10.3. Problem usually happens within 30 days with 9k jumbo clusters allocation failure. >=20 > --=20 > WBR, Andrey V. Elsukov >=20 --=20 Julien Cigar Belgian Biodiversity Platform (http://www.biodiversity.be) PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. --TgCXP+xznsSrEyty Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE7vn2l0to0nV7EWolsrs3EKIEI8AFAllRD9cACgkQsrs3EKIE I8Dj9w//XKLEOkjmTdf9HKiS5Dhe7nLJyFP5RFVXqSU4gx9b6oxh9jHKbBU+10iK v+yxFye/LtQKDabaOvwiGhMhhcYt2zVLWAKecpPwSxFE4KisW1KUw+PkzMXVoEP9 d9MXC3iPbTG3vNBFqZtK+VaXNQc3BZ6ZhgGMjO85Mbn2bp841kEtJROfPLvpYPfF yDcCZOAvD/ALzTprxRWzFZbRCl7TLJRUFCLHIGScm4B/QXbzdb/uqw9U265DQneO cDvig+wddfrC8DI6nhUhkv/o6CvN4pqIm66UZRCGyfni12MnMfXPyKP+ohrZPPGR btrUAzb0lBBM6E1Vmpi37IHERKR22wRsUkB52//ffJwHmmziR8ytM6Rns3V9xgkJ Qf0+PidI5fsqltf47IM47iXgwT04+FSWvZ+aUOv67nRPttAQdoZehXodU/ECnb22 jRMTIISu6p9Jo6ihwoMqFXGqJdOHAumLLu4uzXpVTcgSa6Qk6ei868PxBLo5EoEJ o2MXuz97EnlAfhfpo9zE0uzHnqao8QOrdQgz8CEBv3+0lozuTYGO5hgx+4O8dpl4 531pADsW6zlqTbCfTNJAEZeaGGG/fI8YmosQ4G1zdXL793O21QOwlkk7KXV9HX8U 9M179eg5NssUB2tBTXXqYAdo8iZf/c3staTo5DWj8qEtn78GIVk= =erJV -----END PGP SIGNATURE----- --TgCXP+xznsSrEyty--