From owner-freebsd-scsi@freebsd.org Sun Jun 25 14:54:31 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4F3D1D92E64; Sun, 25 Jun 2017 14:54:31 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wr0-x233.google.com (mail-wr0-x233.google.com [IPv6:2a00:1450:400c:c0c::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CE98E70DFE; Sun, 25 Jun 2017 14:54:30 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wr0-x233.google.com with SMTP id k67so121281522wrc.2; Sun, 25 Jun 2017 07:54:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=EHRT7r60qlKUbl2PrY0ILdGA/eiVC6MuO//XFBnYNFY=; b=nhRCcdYd0bXzBbNuvLETpqnQp++k4jsFSBahXn7hVe9bsivNQzXhVzvX7BGB5xl6yn v7vAxanVKqLWCp9w33AVHoBGrQG0O2rKYbzlRV2ZAWx67yNM5FbgcnDedLL5UzVZdDCO dEBkybR5l76jgjVnlPopui+OZYGTzhvwb0vKxtpzNaN880rastHlmnYFQ3jP0KC0zgcs BmZFhWaWE5rHXvps08qPO7JEWYLXQC4DM8NxheM7jg4DWPr+ADHyFnCrbUIJcQIQyIo3 hqrFouKJAOYCUTruG4hL2pdbi7oPoafNilCBAf9r7Er5sOOTYFZv3tNHBwT2fF6N4HPC CEVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=EHRT7r60qlKUbl2PrY0ILdGA/eiVC6MuO//XFBnYNFY=; b=ljQz57AVxJr3PVuAQNRbBwO0vuuUYMhAQALTyLVjaTtgT2WQEvX8vw/FuHWN07hU7t vPjB2jHD+1j6JjA4qCxGKOLlpUH8pzQEyzJ9oyPHEJlm6WWDx36Ng8FwNadSKeRSod+y QZGmZ5KyI66vczyaocQHMAeRLaAK8puBcISbrlXP2bJbUpAVrz1fBxprg/DKzpc5OKsR V512SHSvKivI3A7pfwrBb9Meg1KAWOOYeKSKEXKS8xPcF/LQbIGTvfbAxoCv6BzCykhD TkFAEBjc8x732+6Sdomcuy56xz0wueMYlt0PDcBs1Ur7PXSafgCbIkJmuQ0268SsOvvM 3Bmw== X-Gm-Message-State: AKS2vOzs4tB03/meVo+w3+uYrFdOH/5Es0Zlpxe9Pcf9KfI3jJSW/ao2 Y7x97tlyyx30GLmABbs= X-Received: by 10.223.144.39 with SMTP id h36mr11995373wrh.114.1498402467549; Sun, 25 Jun 2017 07:54:27 -0700 (PDT) Received: from ben.home (LFbn-1-7159-4.w90-116.abo.wanadoo.fr. [90.116.90.4]) by smtp.gmail.com with ESMTPSA id m73sm10541797wmi.25.2017.06.25.07.54.26 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 25 Jun 2017 07:54:26 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: mbuf_jumbo_9k & iSCSI failing From: Ben RUBSON In-Reply-To: <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> Date: Sun, 25 Jun 2017 16:54:25 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> To: FreeBSD Net , freebsd-scsi@freebsd.org X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jun 2017 14:54:31 -0000 > On 30 Dec 2016, at 22:55, Ben RUBSON wrote: >=20 > Hello, >=20 > 2 FreeBSD 11.0-p3 servers, one iSCSI initiator, one target. > Both with Mellanox ConnectX-3 40G. >=20 > Since a few days, sometimes, under undetermined circumstances, as soon = as there is some (very low) iSCSI traffic, some of the disks get = disconnected : > kernel: WARNING: 192.168.2.2 (iqn......): no ping reply (NOP-Out) = after 5 seconds; dropping connection >=20 > At the same moment, sysctl counters hw.mlxen1.stat.rx_ring*.error grow = on initiator side. >=20 > I then tried to reproduce these network errors burning the link at 40G = full-duplex using iPerf. > But I did not manage to increase these error counters. >=20 > It's strange because it's a sporadic issue, I can have traffic on = iSCSI disks without any issue, and sometimes, they get disconnected with = errors growing. > On 01 Jan 2017, at 09:16, Meny Yossefi wrote: >=20 > Any chance you ran out of mbufs in the system? > On 02 Jan 2017, at 12:09, Ben RUBSON wrote: >=20 > I think you are right, this could be a mbufs issue. > Here are some more numbers : >=20 > # vmstat -z | grep -v "0, 0$" > ITEM SIZE LIMIT USED FREE REQ = FAIL SLEEP > 4 Bucket: 32, 0, 2673, 28327, 88449799, = 17317, 0 > 8 Bucket: 64, 0, 449, 15609, 13926386, = 4871, 0 > 12 Bucket: 96, 0, 335, 5323, 10293892, = 142872, 0 > 16 Bucket: 128, 0, 533, 6070, 7618615, = 472647, 0 > 32 Bucket: 256, 0, 8317, 22133, 36020376, = 563479, 0 > 64 Bucket: 512, 0, 1238, 3298, 20138111, = 11430742, 0 > 128 Bucket: 1024, 0, 1865, 2963, 21162182, = 158752, 0 > 256 Bucket: 2048, 0, 1626, 450, 80253784, = 4890164, 0 > mbuf_jumbo_9k: 9216, 603712, 16400, 8744, 4128521064, = 2661, 0 > On 03 Jan 2017, at 07:27, Meny Yossefi wrote: >=20 > Have you tried increasing the mbufs limit?=20 > (sysctl) kern.ipc.nmbufs (Maximum number of mbufs allowed) > On 04 Jan 2017, at 14:47, Ben RUBSON wrote: >=20 > No I did not try this yet. > However, from the numbers above (and below), I think I should increase = kern.ipc.nmbjumbo9 instead ? > On 30 Jan 2017, at 15:36, Ben RUBSON wrote: >=20 > So, to give some news, increasing kern.ipc.nmbjumbo9 helped a lot. > Just a very little issue (compared to the others before) over the last = 3 weeks. Hello, I'm back today with this issue. Above is my discussion with Meny from Mellanox at the beginning of 2017. (topic was "iSCSI failing, MLX rx_ring errors ?", on freebsd-net list) So this morning issue came again, some of my iSCSI disks were = disconnected. Below are some numbers. # vmstat -z | grep -v "0, 0$" ITEM SIZE LIMIT USED FREE REQ FAIL = SLEEP 8 Bucket: 64, 0, 654, 8522, 28604967, 11, 0 12 Bucket: 96, 0, 976, 5092, 23758734, 78, 0 32 Bucket: 256, 0, 789, 4491, 43446969, 137, 0 64 Bucket: 512, 0, 666, 2750, 47568959, 1272018, 0 128 Bucket: 1024, 0, 1047, 1249, 28774042, 232504, 0 256 Bucket: 2048, 0, 1611, 369, 139988097, 8931139, 0 vmem btag: 56, 0, 2949738, 15506, 18092235, 20908, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 8776, 8610737115, 297, 0 # uname -rs FreeBSD 11.0-RELEASE-p8 # uptime 3:34p.m. up 88 days, 15:57, 2 users, load averages: 0.95, 0.67, 0.62 # grep kern.ipc.nmb /boot/loader.conf=20 kern.ipc.nmbjumbo9=3D2037529 kern.ipc.nmbjumbo16=3D1 # sysctl kern.ipc | grep mb kern.ipc.nmbufs: 26080380 kern.ipc.nmbjumbo16: 4 kern.ipc.nmbjumbo9: 6112587 kern.ipc.nmbjumbop: 2037529 kern.ipc.nmbclusters: 4075060 kern.ipc.maxmbufmem: 33382887424 # ifconfig mlxen1 mlxen1: flags=3D8843 metric 0 = mtu 9020 = options=3Ded07bb nd6 options=3D29 media: Ethernet autoselect (40Gbase-CR4 ) status: active I just caught the issue growing : # vmstat -z | grep mbuf_jumbo_9k ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735246407, 665, 0 mbuf_jumbo_9k: 9216, 2037529, 16411, 7320,8735286748, 665, 0 mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735298937, 667, 0 mbuf_jumbo_9k: 9216, 2037529, 16438, 7293,8735337634, 667, 0 mbuf_jumbo_9k: 9216, 2037529, 16407, 7324,8735354339, 668, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735382105, 669, 0 mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735392836, 671, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735423910, 671, 0 mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735456393, 671, 0 mbuf_jumbo_9k: 9216, 2037529, 16409, 7322,8735472284, 672, 0 mbuf_jumbo_9k: 9216, 2037529, 16420, 7311,8735512237, 673, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735518502, 675, 0 mbuf_jumbo_9k: 9216, 2037529, 16410, 7321,8735543668, 676, 0 mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8735555646, 678, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735568986, 679, 0 mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8735579075, 680, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735603983, 681, 0 mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735634273, 681, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735646057, 683, 0 mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735658213, 684, 0 mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8735675678, 686, 0 mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735686017, 687, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735707335, 687, 0 mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8736016546, 708, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736037292, 709, 0 mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8736053865, 710, 0 mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8736070103, 711, 0 mbuf_jumbo_9k: 9216, 2037529, 16407, 7324,8736086810, 711, 0 mbuf_jumbo_9k: 9216, 2037529, 16430, 7301,8736098568, 713, 0 mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8736122803, 714, 0 mbuf_jumbo_9k: 9216, 2037529, 16417, 7314,8736134322, 715, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736152338, 715, 0 mbuf_jumbo_9k: 9216, 2037529, 16403, 7328,8736167677, 715, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736170783, 717, 0 mbuf_jumbo_9k: 9216, 2037529, 16445, 7286,8736546084, 733, 0 During this, top was reporting the following : Mem: 4056K Active, 426M Inact, 59G Wired, 2531M Free And in /var/log/messages : kernel: WARNING: 192.168.2.2 (iqn......): no ping reply (NOP-Out) after = 5 seconds; dropping connection Any idea why I'm experiencing this ? Thank you very much for your help & support, Best regards, Ben From owner-freebsd-scsi@freebsd.org Sun Jun 25 15:14:08 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8C784D938F3; Sun, 25 Jun 2017 15:14:08 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-it0-x236.google.com (mail-it0-x236.google.com [IPv6:2607:f8b0:4001:c0b::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46486730B7; Sun, 25 Jun 2017 15:14:08 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: by mail-it0-x236.google.com with SMTP id m84so12725525ita.0; Sun, 25 Jun 2017 08:14:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=0Y33zcri/NfWAWZUmPaK9x6nbkgvzR6xwBm4MjKLF5k=; b=XSRnqiUz2Qn0zjXkUE6lGWl8qzumHijXNtl2mphiybXwUuHPnztgRSy8btsxq/0xaP +bgCmVze/74NaiT6d5PCqyaDPPEsILNIrhabOXdKAmsfgtlSbJFw0muugt9P8KIl0b1R F19L15mkg57K0+x0GE2F1aZgGIH/UmztIFICvmz80RR8f6cZoajSizoszLC9Qw4l+w6z fdP9LzVkgTrGo1+25gABG34NymosYB9MtHgcNqq79ZHVT/AwmTX6mSvUZKZFPEwWjSbv LAJnZyxPT0759q6upU8YJ7nZV6YYAk9AEbgtrXG/wZR5a/doIp/y1WxmiU4d+WARue0S oqoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=0Y33zcri/NfWAWZUmPaK9x6nbkgvzR6xwBm4MjKLF5k=; b=iDnDzpBDIjaGPtaWLSwDMnwRZQGs2tifb4i2gv58peoqD8JAUR6uZe5L4TPuPgpKcC uJPCW7dxb4OccWdHxyOKW4/DyN+I8s/ehAV5K7s1F2yvtIv2JwhXkJKqlSyhVP28noUk DQ+7kD/J3nXZUrL4qE4HYK6DsialF+990eDDkHc5475jtH6YkgYKDMKHWcj7VtC7B0Yc C2hh37x9VxEFyzBQDL0NmIRQR1oDAjS09HUUzYaFJcEvCh5ea/WR0X4B6BlMoThgiU2X t1n9Lg0cNu+MspNF0QoDeNzcbjosLtVwi6RE+5kanmCGxMz1iZW/AOzlOo7hNDK6KD35 HnPg== X-Gm-Message-State: AKS2vOw0NyPb+yHULPnN1jkHORu5LQ8Eb3xKzCxtcsm3wC64T2d5Hv/6 EHUJIwkuwfHy1ENu/Fsi+ajTqZDaLQ== X-Received: by 10.36.254.134 with SMTP id w128mr6216777ith.2.1498403647595; Sun, 25 Jun 2017 08:14:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.156.21 with HTTP; Sun, 25 Jun 2017 08:14:07 -0700 (PDT) In-Reply-To: <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> From: Ryan Stone Date: Sun, 25 Jun 2017 11:14:07 -0400 Message-ID: Subject: Re: mbuf_jumbo_9k & iSCSI failing To: Ben RUBSON Cc: FreeBSD Net , "freebsd-scsi@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jun 2017 15:14:08 -0000 Is this setup using the mlx4_en driver? If so, recent versions of that driver has a regression when using MTUs greater than the page size (4096 on i386/amd64). The bug will cause the card to drop packets when the system is under memory pressure, and in certain causes the card can get into a state when it is no longer able to receive packets. I am working on a fix; I can post a patch when it's complete. From owner-freebsd-scsi@freebsd.org Sun Jun 25 15:28:35 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 70BE7D93D9E; Sun, 25 Jun 2017 15:28:35 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wr0-x242.google.com (mail-wr0-x242.google.com [IPv6:2a00:1450:400c:c0c::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 138E373842; Sun, 25 Jun 2017 15:28:34 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wr0-x242.google.com with SMTP id 77so24896158wrb.3; Sun, 25 Jun 2017 08:28:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=N1VzSMP7GTIsjlChBfgT/1wq+KUsh8jOhru8CYgipyE=; b=Xm5XC7/G6JyInbRznqfLTEnabQYYxipTSoRcb84BMNBq7y/8GxdBOl9dal9Jr4V5a3 y6/OSkBjUKUQBN6b4BNiUf+Gm4ctsHgJrjgoDup17KXzWinU07A2jj8CET8o/l3Joy6t d6vS5tywKM4bHmofeGikK+gptWyVd1KaoUfVsEhk00ENDUyjFN6sdjadbSV0xo//SzLr FVtrme+iUhXMIZxXYxk1dIgDlHhnYOXAlRy8FK/vtLPf/qQAcWC9570WJybZu9oOx5pp /aNZ0+wq7yNkdoYpiC3+XX+n8BdACk/tqi3KqIdeEf0QmYtTarDYKPFZB34EeKh4o8EU Dyxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=N1VzSMP7GTIsjlChBfgT/1wq+KUsh8jOhru8CYgipyE=; b=LZwlA9tzGAHfF/x3dYFVWlpzoHhTZSzsfuvegKHUJl+r83p0NL3yqDQ4B8PEYYJezW emRh02m4EiblSyJyM2eFUbn8hxLZW3gx6JJISt6hjPhx55YdsW5BIwmCCBOCogu+d9tc +lylFwcnoRBdjmsPeEY1ggvSwI3LygtkT2c1hCNcBWeVSvYjILWq+irhzj8bKb324Nt5 PdBvZVJedInFCAEe0RZe2ijknzTVjlEzWxhFxPb3e//lYADDFzV0LCGoapJ8tr8wqVVN 5IiVv0pd/PyFhvVnT+7hNUq1DHBV+oETr2eakdjCHtUJzDw2g5jgGLONVvLTxcS+EX3j adZQ== X-Gm-Message-State: AKS2vOznj4kNtJM6wvH72KAAxDf4ghUHJJVWKMyPZEecwwQIk7Ev8r1m BTusItBYPGH+K9UDrxo= X-Received: by 10.28.10.194 with SMTP id 185mr10363386wmk.119.1498404513492; Sun, 25 Jun 2017 08:28:33 -0700 (PDT) Received: from ben.home (LFbn-1-7159-4.w90-116.abo.wanadoo.fr. [90.116.90.4]) by smtp.gmail.com with ESMTPSA id o6sm6159918wrc.48.2017.06.25.08.28.32 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 25 Jun 2017 08:28:32 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: mbuf_jumbo_9k & iSCSI failing From: Ben RUBSON In-Reply-To: Date: Sun, 25 Jun 2017 17:28:32 +0200 Cc: FreeBSD Net , "freebsd-scsi@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> To: Ryan Stone X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jun 2017 15:28:35 -0000 > On 25 Jun 2017, at 17:14, Ryan Stone wrote: >=20 > Is this setup using the mlx4_en driver? If so, recent versions of = that driver has a regression when using MTUs greater than the page size = (4096 on i386/amd64). The bug will cause the card to drop packets when = the system is under memory pressure, and in certain causes the card can = get into a state when it is no longer able to receive packets. I am = working on a fix; I can post a patch when it's complete. Thank you very much for your feedback Ryan. Yes, my system is using mlx4_en driver, the one directly from FreeBSD = 11.0 sources tree. Any indicator I could catch to be sure I'm experiencing the issue you = are working on ? Sounds like anyway I may be suffering from it... Of course I would be glad to help testing your patch when it's complete. Thank you again, Ben From owner-freebsd-scsi@freebsd.org Sun Jun 25 15:32:01 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E8FDAD94000; Sun, 25 Jun 2017 15:32:01 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-it0-x229.google.com (mail-it0-x229.google.com [IPv6:2607:f8b0:4001:c0b::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id ACBB273B81; Sun, 25 Jun 2017 15:32:01 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: by mail-it0-x229.google.com with SMTP id b205so28631718itg.1; Sun, 25 Jun 2017 08:32:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=MsAEoNurtT5IYa40c+cR50mA6+sUo55yg6jG04sP4Ik=; b=ZYDBYSw2SSa7lX/Aisw5huwcTxYk9wG5H2lSjTbX5edRxEUaSiSx9FJ6Fd7ZMMMlIe EwOh9cnjpPC6pYaHp8fiBbe2h87uO9C409kb86TSAlS5R1NvUwZZTX4mbD9g7w8r4yd/ HPQnRDVpLNVn67cCYIs8j7u4t/8MHsh01DosYi57tEOO3E3AXdooTWuyinRSGrrXcU9T Wm5FGbg8Ih+oyQ4TJVnswxE6rpUbrY1+geQ4VRNHeqBfnRZ7C+QX4i8ksiy6vL4ENsTP qHboF5RrEj+XqvC9gP3eyq/ti8VzZJntKPkR6y0LZRYKis44jUEJs/KBay+JuqmHr3ts 11nA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=MsAEoNurtT5IYa40c+cR50mA6+sUo55yg6jG04sP4Ik=; b=tZzX5ya7qG2x3CoovlCTt7CLw/I+7EGfzB5IBh8ZJgh7V68Pypz2d+xYQDSR3l6iuK 6jRLb5xPUVoQj8IORBRadS+/fqswZ4bHR0x1Sf0lWrq03WlaPwaccUnw8NDtAxS5HmeU HOBL8SYYDDto7uPdlf/7mJBPScRncAjE+pC3COWZqcE1V4KvwLNmFCl7XAdB/nS377Ws /X0DpmEtS4G+0E2n07B28JvGdzWksSI4TO1P9m22SDId6z6DFRCulJbxmqpBiav+s22p 509UcXDmPj2BtWi63WFTnujEtHQDY8Tz7b+Q/KUmeNQbjro87KmN0DGftewh6isKXIay jtpQ== X-Gm-Message-State: AKS2vOzaagOaGSkhSHqrbEIyx+rgHxgkgzF0IW0NSeTipuONBwjwt1Ij woN7ZpcOZw11ajOg96CBa/H9hO9oDQ== X-Received: by 10.36.65.23 with SMTP id x23mr18161797ita.2.1498404721102; Sun, 25 Jun 2017 08:32:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.156.21 with HTTP; Sun, 25 Jun 2017 08:32:00 -0700 (PDT) In-Reply-To: References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> From: Ryan Stone Date: Sun, 25 Jun 2017 11:32:00 -0400 Message-ID: Subject: Re: mbuf_jumbo_9k & iSCSI failing To: Ben RUBSON Cc: FreeBSD Net , "freebsd-scsi@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jun 2017 15:32:02 -0000 Having looking at the original email more closely, I see that you showed an mlxen interface with a 9020 MTU. Seeing allocation failures of 9k mbuf clusters increase while you are far below the zone's limit means that you're definitely running into the bug I'm describing, and this bug could plausibly cause the iSCSI errors that you describe. The issue is that the newer version of the driver tries to allocate a single buffer to accommodate an MTU-sized packet. Over time, however, memory will become fragmented and eventually it can become impossible to allocate a 9k physically contiguous buffer. When this happens the driver is unable to allocate buffers to receive packets and is forced to drop them. Presumably, if iSCSI suffers too many packet drops it will terminate the connection. The older version of the driver limited itself to page-sized buffers, so it was immune to issues with memory fragmentation. From owner-freebsd-scsi@freebsd.org Sun Jun 25 16:56:49 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D2FB0D9571C; Sun, 25 Jun 2017 16:56:49 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wr0-x22e.google.com (mail-wr0-x22e.google.com [IPv6:2a00:1450:400c:c0c::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5623376166; Sun, 25 Jun 2017 16:56:49 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wr0-x22e.google.com with SMTP id 77so124109801wrb.1; Sun, 25 Jun 2017 09:56:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=3FL9L7x3JW4a1K9f2uyaxkqNrwNaS2oDAJKRyxGJF3s=; b=rVN+UHhTK5C1k0MsRtveBnlLadwcmlKbZ393jeaHwLbDmePZ1uE07tdb9gywHyLJqS qMe4d1jpjHznV4Ape8xyWHVgvJicKXdGOIIECgmnk6MDb85B4C1dK5qPIZ49ix97eGhG PtPONyMcEyD4PUHH3QwHC8JKeZJLut3eepdmoPxihKpYFF1jsaLCttgBp7+Y5c+8MSJd D1V1GeTfw1V71X9+M1beX9M1MaHhzq0vf2jD1QgX8+5DZXS1aV0k4Oef4mvn9O4I1vVl trAGHZ0T3NnSDCzDqGXLGBiu15v3nTELAjl6QN9R001IzWxCaJyhfLIiWk1L4Azs9Jvf D3gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=3FL9L7x3JW4a1K9f2uyaxkqNrwNaS2oDAJKRyxGJF3s=; b=qGJeObOA7sHXZz6O63CsXaa4zTG56G0LvIliYgKPagQn8r23vdM1TTMC1Y2XMPv3tm gw2w2NvC2WhTbDB5jdvqm34TmlMLN+n55R+Ih9latd9mKEFm0vdKsLne3fWVKgsoHG4E 5VMhijRpkfcySGHG3nsAwwZADqjCf/U4mockEpGVtVpG9YVEzcVP/KF+S8h5MiULb6Vs 96/0iHVkXYKDSh4Z4bHWHI55WlU4LHKmDeuo/9Sibwhmfkv/0fBab1Cdc4Ck8qFyDXhw JGzKRkNV8/TdqLEi8G1boeRTyg/Na8Z8wtISf5D3lS+npSPCl1SgfwHA4O8yDC4jAo3C hWCw== X-Gm-Message-State: AKS2vOwvEi/bsm3JwNMotn+nmuz54683rTtA4wE8ZogwzA3HIxoU1RbT zkV1+cL+JEGvlw== X-Received: by 10.223.176.253 with SMTP id j58mr10795933wra.65.1498409807388; Sun, 25 Jun 2017 09:56:47 -0700 (PDT) Received: from ben.home (LFbn-1-7159-4.w90-116.abo.wanadoo.fr. [90.116.90.4]) by smtp.gmail.com with ESMTPSA id m26sm14755941wrm.4.2017.06.25.09.56.46 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 25 Jun 2017 09:56:46 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: mbuf_jumbo_9k & iSCSI failing From: Ben RUBSON In-Reply-To: Date: Sun, 25 Jun 2017 18:56:45 +0200 Cc: FreeBSD Net , "freebsd-scsi@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <14CB3F50-0426-48BD-838C-943B6D15FEB9@gmail.com> References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> To: Ryan Stone X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jun 2017 16:56:49 -0000 > On 25 Jun 2017, at 17:32, Ryan Stone wrote: >=20 > Having looking at the original email more closely, I see that you = showed an mlxen interface with a 9020 MTU. Seeing allocation failures = of 9k mbuf clusters increase while you are far below the zone's limit = means that you're definitely running into the bug I'm describing, and = this bug could plausibly cause the iSCSI errors that you describe. >=20 > The issue is that the newer version of the driver tries to allocate a = single buffer to accommodate an MTU-sized packet. Over time, however, = memory will become fragmented and eventually it can become impossible to = allocate a 9k physically contiguous buffer. When this happens the = driver is unable to allocate buffers to receive packets and is forced to = drop them. Presumably, if iSCSI suffers too many packet drops it will = terminate the connection. The older version of the driver limited = itself to page-sized buffers, so it was immune to issues with memory = fragmentation. Thank you for your explanation Ryan. You say "over time", and you're right, I have to wait several days (here = 88) before the problem occurs. Strange however that in 2500MB free memory system is unable to find 9k = physically contiguous. But we never know :) Let's then wait for your patch ! (and reboot for now) Many thx ! Ben= From owner-freebsd-scsi@freebsd.org Sun Jun 25 19:04:12 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 27010D97DDC for ; Sun, 25 Jun 2017 19:04:12 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 14EC87A1F1 for ; Sun, 25 Jun 2017 19:04:12 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v5PJ4B3a090569 for ; Sun, 25 Jun 2017 19:04:11 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 218830] [cam] [patch] add CAM pass(4) support for NVMe Date: Sun, 25 Jun 2017 19:04:11 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: chuck@tuffli.net X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-scsi@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: attachments.isobsolete attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jun 2017 19:04:12 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D218830 chuck@tuffli.net changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #182021|0 |1 is obsolete| | --- Comment #1 from chuck@tuffli.net --- Created attachment 183788 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D183788&action= =3Dedit Updated patch based on review comments Updates the patch to address comments made in review https://reviews.freebsd.org/D10247 --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-scsi@freebsd.org Mon Jun 26 00:40:53 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 56038D9D9DB for ; Mon, 26 Jun 2017 00:40:53 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 43FE283580 for ; Mon, 26 Jun 2017 00:40:53 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v5Q0errI047609 for ; Mon, 26 Jun 2017 00:40:53 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 220267] [patch] NVMe kernel driver should use 32-bit NSID Date: Mon, 26 Jun 2017 00:40:53 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: cem@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-scsi@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 00:40:53 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D220267 Conrad Meyer changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |cem@freebsd.org Assignee|freebsd-bugs@FreeBSD.org |freebsd-scsi@FreeBSD.org --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-scsi@freebsd.org Mon Jun 26 06:19:38 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7245BDA289F for ; Mon, 26 Jun 2017 06:19:38 +0000 (UTC) (envelope-from emz@norma.perm.ru) Received: from elf.hq.norma.perm.ru (mail.norma.perm.ru [IPv6:2a00:7540:1::5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.norma.perm.ru", Issuer "Vivat-Trade UNIX Root CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D4CFB67E16 for ; Mon, 26 Jun 2017 06:19:37 +0000 (UTC) (envelope-from emz@norma.perm.ru) Received: from bsdrookie.norma.com. (net206-94.perm.ertelecom.ru [46.146.206.94] (may be forged)) by elf.hq.norma.perm.ru (8.15.2/8.15.2) with ESMTPS id v5Q6JUdk040278 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Mon, 26 Jun 2017 11:19:30 +0500 (YEKT) (envelope-from emz@norma.perm.ru) Subject: Re: mps(4) blocks panic-reboot To: freebsd-scsi@freebsd.org References: <592FDE8C.1090609@omnilan.de> <59303484.1040609@omnilan.de> <59306503.4010007@omnilan.de> <59315A74.9050506@omnilan.de> <20170602153705.GA56018@mithlond.kdm.org> <593198C3.2080902@omnilan.de> <59319917.1050301@omnilan.de> <86a38661813a20d3b349920c2de8962e@mail.gmail.com> From: "Eugene M. Zheganin" Message-ID: Date: Mon, 26 Jun 2017 11:19:30 +0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <86a38661813a20d3b349920c2de8962e@mail.gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Spamd-Result: default: False [3.61 / 25.00] RBL_SPAMHAUS_PBL(2.00)[94.206.146.46.zen.spamhaus.org : 127.0.0.10] HFILTER_HOSTNAME_UNKNOWN(2.50)[] DMARC_NA(0.00)[norma.perm.ru] MIME_GOOD(-0.10)[text/plain] R_DKIM_NA(0.00)[] R_SPF_SOFTFAIL(0.00)[~all] RCPT_COUNT_1(0.00)[] MID_RHS_MATCH_FROM(0.00)[] RECEIVED_SPAMHAUS(0.00)[94.206.146.46.zen.spamhaus.org] TO_MATCH_ENVRCPT_ALL(0.00)[] FROM_HAS_DN(0.00)[] TO_DN_NONE(0.00)[] FROM_EQ_ENVFROM(0.00)[] BAYES_HAM(-0.89)[85.91%] RCVD_COUNT_1(0.00)[] ONCE_RECEIVED(0.10)[] X-Rspamd-Server: localhost X-Rspamd-Scan-Time: 2.12 X-Rspamd-Queue-ID: v5Q6JUdk040278 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 06:19:38 -0000 Hi. On 02.06.2017 22:13, Stephen Mcconnell via freebsd-scsi wrote: > Thanks Harry. I'll need to do some testing here to see if I can figure it > out. Guys, can I ask what is the status of this ? Was it commited/MFC'd ? Because seems like I'm having the very same issue on a mpr(4). Look: http://static.enaza.ru/userupload/gyazo/4da4b1c84c48bd592e676af46fff.png It happens at least twice a week. Eugene. From owner-freebsd-scsi@freebsd.org Mon Jun 26 08:42:38 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 43E54DA4920 for ; Mon, 26 Jun 2017 08:42:38 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 32AA36F9F1 for ; Mon, 26 Jun 2017 08:42:38 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v5Q8gbZY023215 for ; Mon, 26 Jun 2017 08:42:38 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 219701] crash in camperiphfree() Date: Mon, 26 Jun 2017 08:42:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: ken@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 08:42:38 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219701 --- Comment #3 from Andriy Gapon --- (In reply to Kenneth D. Merry from comment #2) I never had a way to reproduce the problem, so I can not say with 100% certainty that the problem is fixed. The code looks like it does fix the problem. I do not see any regressions from the patch. And I have not seen= any recurrence of the problem. Thank you! --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-scsi@freebsd.org Mon Jun 26 13:02:10 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D257D86471; Mon, 26 Jun 2017 13:02:10 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: from mail-vk0-x230.google.com (mail-vk0-x230.google.com [IPv6:2607:f8b0:400c:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F1D40783BF; Mon, 26 Jun 2017 13:02:09 +0000 (UTC) (envelope-from etnapierala@gmail.com) Received: by mail-vk0-x230.google.com with SMTP id 191so319311vko.2; Mon, 26 Jun 2017 06:02:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=jmEyFFLQSYeM6xEEdWTEN7u+GglUoER+WvbyAu6RKCY=; b=dDvce0nc7uS/JU9udWXQ7W9S76X6QrhtZrZSAqpnEA7RKlo6lnuXNGoG6glYfGd3k4 hdiR47Kf51r3abXe/SuQG8fbklEhugvrVXnOHJ8XDciWqMPWcbK1zi3sx2guZE2nEfKC KEvalK6kwd4gUXBZfF3+ZOrg2MoNElgjRSJ18tloglEfLzABw/80GOdf8Y/MBxmzZYA4 h59lzEeGIff/4QVR469hjAHBknuiGhuA3SCuFWrs6Mm7rS9YoHiaaWvw+d4Y7vrPSCra m8EcDdrv7oCLXD4aQ44pu2nKKH1jvyh44u7fc7F1L5ubTi1GJA5WrGgwTG9xub+q+uUh mJYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=jmEyFFLQSYeM6xEEdWTEN7u+GglUoER+WvbyAu6RKCY=; b=OpwFFiLa99YfU11FpYER1ZPCHXrUwSgylXiPb5PJjyoNG+gz0F20DXXOm39/ub1ZWs Rc4B0p0dVCuDg3/LI0Vu9symqeGF7a7v2EssyvrgarMfxn3i37TgbLiQsEOc2WtKiwgj 8bsnDJczeryG4loGPARdKhHv/UsFdisesMMcZ/BuDc/QpDV6aL4KkEPwujNVR34zv+73 +NIaD68RUw9hjUH7910MfHGtx8W3MokbKv2zrXfQ5hLmDRDJ5aQe/q1awrRqoDS9+m40 JlpKYrt2wD17+iqrtSYEY+XZsHLk/6AwJqIeKR9tvedC250PzHW62PGWEEBHqmwEFNss xrHg== X-Gm-Message-State: AKS2vOy7gT2SeyTpV0hNhYxqr1fae0A7+t3kVElb6vrNhWhmTz9C9GV2 owh5XTKwovq87NRMnTQsoAZP9Zcezg== X-Received: by 10.31.222.193 with SMTP id v184mr32122vkg.73.1498482129070; Mon, 26 Jun 2017 06:02:09 -0700 (PDT) MIME-Version: 1.0 Sender: etnapierala@gmail.com Received: by 10.176.83.198 with HTTP; Mon, 26 Jun 2017 06:02:08 -0700 (PDT) In-Reply-To: References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> From: Edward Napierala Date: Mon, 26 Jun 2017 14:02:08 +0100 X-Google-Sender-Auth: oGGxrb1a2EIfNU6zTD5eQv8QPbk Message-ID: Subject: Re: mbuf_jumbo_9k & iSCSI failing To: Ryan Stone Cc: Ben RUBSON , FreeBSD Net , "freebsd-scsi@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 13:02:10 -0000 2017-06-25 16:32 GMT+01:00 Ryan Stone : > Having looking at the original email more closely, I see that you showed an > mlxen interface with a 9020 MTU. Seeing allocation failures of 9k mbuf > clusters increase while you are far below the zone's limit means that > you're definitely running into the bug I'm describing, and this bug could > plausibly cause the iSCSI errors that you describe. > > The issue is that the newer version of the driver tries to allocate a > single buffer to accommodate an MTU-sized packet. Over time, however, > memory will become fragmented and eventually it can become impossible to > allocate a 9k physically contiguous buffer. When this happens the driver > is unable to allocate buffers to receive packets and is forced to drop > them. Presumably, if iSCSI suffers too many packet drops it will terminate > the connection. [..] More specifically, it will terminate the connection when there's no "ping reply" from the other side for the configured amount of time, which defaults to five seconds. It can be changed using the kern.iscsi.ping_timeout sysctl, as described in iscsi(4). From owner-freebsd-scsi@freebsd.org Mon Jun 26 13:16:23 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1CACBD86CDA; Mon, 26 Jun 2017 13:16:23 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from forward1m.cmail.yandex.net (forward1m.cmail.yandex.net [5.255.216.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "forwards.mail.yandex.net", Issuer "Yandex CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A21C378EB1; Mon, 26 Jun 2017 13:16:22 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from smtp1o.mail.yandex.net (smtp1o.mail.yandex.net [37.140.190.26]) by forward1m.cmail.yandex.net (Yandex) with ESMTP id 3B4B3215F8; Mon, 26 Jun 2017 16:16:13 +0300 (MSK) Received: from smtp1o.mail.yandex.net (localhost.localdomain [127.0.0.1]) by smtp1o.mail.yandex.net (Yandex) with ESMTP id 94CEC1300B9E; Mon, 26 Jun 2017 16:16:11 +0300 (MSK) Received: by smtp1o.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id VSmE86ptau-GAcSd94F; Mon, 26 Jun 2017 16:16:10 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1498482970; bh=UwoC2WoPuBeLK+HvGWTll0M+Zmas0TUMuu8l5xHy7vY=; h=Subject:To:Cc:References:From:Message-ID:Date:In-Reply-To; b=MPpPn9vIZU0GezIh67TJPsoGiCPwKKFVMJnVHXHl8jaZm8rvEoU83F1u6bYJXu+d4 3ATLDqm/fwSkAnNaowxhdQGyieSSlRfqihokaKov36noijxH6iHd5C7yImLSvQelU/ BrFrFwD4FYY6eqPT5zutCCAWRn9EIEdu9PHiW68E= Authentication-Results: smtp1o.mail.yandex.net; dkim=pass header.i=@yandex.ru X-Yandex-Suid-Status: 1 0,1 0,1 0,1 0 Subject: Re: mbuf_jumbo_9k & iSCSI failing To: Ryan Stone , Ben RUBSON Cc: FreeBSD Net , "freebsd-scsi@freebsd.org" References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> From: "Andrey V. Elsukov" Openpgp: id=E6591E1B41DA1516F0C9BC0001C5EA0410C8A17A Message-ID: <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> Date: Mon, 26 Jun 2017 16:13:33 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.0.1 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="smqF3QFS4WoFbpvK6WMrFmhjvPMtBWeiE" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 13:16:23 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --smqF3QFS4WoFbpvK6WMrFmhjvPMtBWeiE Content-Type: multipart/mixed; boundary="LXCCDuFkvV5DOvwWRd8qkkXQ6ssJnmhDg"; protected-headers="v1" From: "Andrey V. Elsukov" To: Ryan Stone , Ben RUBSON Cc: FreeBSD Net , "freebsd-scsi@freebsd.org" Message-ID: <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> Subject: Re: mbuf_jumbo_9k & iSCSI failing References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> In-Reply-To: --LXCCDuFkvV5DOvwWRd8qkkXQ6ssJnmhDg Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 25.06.2017 18:32, Ryan Stone wrote: > Having looking at the original email more closely, I see that you showe= d an > mlxen interface with a 9020 MTU. Seeing allocation failures of 9k mbuf= > clusters increase while you are far below the zone's limit means that > you're definitely running into the bug I'm describing, and this bug cou= ld > plausibly cause the iSCSI errors that you describe. >=20 > The issue is that the newer version of the driver tries to allocate a > single buffer to accommodate an MTU-sized packet. Over time, however, > memory will become fragmented and eventually it can become impossible t= o > allocate a 9k physically contiguous buffer. When this happens the driv= er > is unable to allocate buffers to receive packets and is forced to drop > them. Presumably, if iSCSI suffers too many packet drops it will termi= nate > the connection. The older version of the driver limited itself to > page-sized buffers, so it was immune to issues with memory fragmentatio= n. I think it is not mlxen specific problem, we have the same symptoms with ixgbe(4) driver too. To avoid the problem we have patches that are disable using of 9k mbufs, and instead only use 4k mbufs. --=20 WBR, Andrey V. Elsukov --LXCCDuFkvV5DOvwWRd8qkkXQ6ssJnmhDg-- --smqF3QFS4WoFbpvK6WMrFmhjvPMtBWeiE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEzBAEBCAAdFiEE5lkeG0HaFRbwybwAAcXqBBDIoXoFAllRCH0ACgkQAcXqBBDI oXobwAf+Lxnto9w+KaTuAIkktZnoN42kjUNtY1bMNFn5amhEgFNlUeQxqFymZdvQ h7f2cAiNJ3SLVxiw/cxbeOK4r2Bk53JwNBEwI/VtyNCDLyikJ7Ov8yVOAgG4RydA llp+ZXsko19zxdlR5aBW140egiRCXTMvbNZ4IoqE3GiHwS6TDEFglbbEbUJK0r9l A9zlW+0EAxo3UELSQhfymIALfXHCPLzM0AYf/VvWdvNIio3Y1ZMeZK6Rkofgefux yddJnL54pVWKz1LvfuSEtE+wW9Tm/nF2MeRrdLKsiSWJGuUwTKlMq4porJ3K0Viq V6MHyi2U7818TQcDBbOkwrztEL9l3w== =S8WI -----END PGP SIGNATURE----- --smqF3QFS4WoFbpvK6WMrFmhjvPMtBWeiE-- From owner-freebsd-scsi@freebsd.org Mon Jun 26 13:27:12 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 02E82D87071; Mon, 26 Jun 2017 13:27:12 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wr0-x22c.google.com (mail-wr0-x22c.google.com [IPv6:2a00:1450:400c:c0c::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 936FF79506; Mon, 26 Jun 2017 13:27:11 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wr0-x22c.google.com with SMTP id 77so144753916wrb.1; Mon, 26 Jun 2017 06:27:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=L/8RwOIW8TnkqeTb/juk5I5r3hOn5WAsY5Ym+odT6Qw=; b=VzOnwldmkuC6wbBYbuQ3krqYJtm3fkOtvj56d0D/jgR6+rRnCKZFd8PnAbh7DJ3SA4 6CpPImnK5xeVdLTpC4xeT4VCgCVfKUYzp/jNbSr1FwxvEiQ6+KmsBaSQHCY/oPmNBgE0 Brt7806mZqXV65FANpOOKX0rJM6QF2dhvP+87Rm8BGhT7rf31foYXk16mRqJbujjU0wP 5/qQK3k8TsLgyIlF3gzlon+UEbrONpV2WZ9tZcNzIUlx1mkwXXtneyN8TZ07SIi/2kx6 6f+2YVTFUtCwtDfb79Zu589mUozBRfUEyofIZd8KKSfRRmEwXBTf8cZxNyTzYVlE6MJu rmPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=L/8RwOIW8TnkqeTb/juk5I5r3hOn5WAsY5Ym+odT6Qw=; b=CNqi09MtqItPkLTZQQTG2onfRxwIHyqy/8x0uQo+l+ecMkKQaG/7r08UVxyEHyf82l 2/4CYdO28aIrmLpFoLDE6E7WnsrRuW1fH6YjuWIUP5bOcAlCkb1alA804QW37MQaEyf0 UI4DODgzpNXjnlr2D6IjjPTqB1CmXkPFNTCJTg+NU9J9t0tBNpBmoFDNFPIPL+Ikxwgl PglWu+BU1ZHzRg4bQNU3i6bfsYGZfuugRmKJgseAPe0Xns+0tgIKXzfY1D5CSOmzUOSn Ho+wyIoyJcpg1ILDpkqeU7aXsNVwv08XEPmvi7K+OCflEfwg0GE0Hg7UD2i94fp/Zff2 yMCg== X-Gm-Message-State: AKS2vOzh+XBHdpzPG4y7NrY9GHlZkZRHdlUO0v48mkZKSAKtYX5BqG3L vc74UpnCsiHLgQ== X-Received: by 10.223.176.61 with SMTP id f58mr13231360wra.74.1498483629840; Mon, 26 Jun 2017 06:27:09 -0700 (PDT) Received: from ben.home (LFbn-1-7159-4.w90-116.abo.wanadoo.fr. [90.116.90.4]) by smtp.gmail.com with ESMTPSA id p34sm10079679wrc.66.2017.06.26.06.27.08 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 26 Jun 2017 06:27:09 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: mbuf_jumbo_9k & iSCSI failing From: Ben RUBSON In-Reply-To: <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> Date: Mon, 26 Jun 2017 15:27:07 +0200 Cc: Ryan Stone , FreeBSD Net , "freebsd-scsi@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> To: "Andrey V. Elsukov" X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 13:27:12 -0000 > On 26 Jun 2017, at 15:13, Andrey V. Elsukov wrote: >=20 > I think it is not mlxen specific problem, we have the same symptoms = with > ixgbe(4) driver too. To avoid the problem we have patches that are > disable using of 9k mbufs, and instead only use 4k mbufs. Interesting feedback Andrey, thank you ! The problem may be then "general". So you still use large MTU (>=3D9000) but only allocating 4k mbufs, as a = workaround ?= From owner-freebsd-scsi@freebsd.org Mon Jun 26 13:28:19 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D736BD8715B; Mon, 26 Jun 2017 13:28:19 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from forward5j.cmail.yandex.net (forward5j.cmail.yandex.net [IPv6:2a02:6b8:0:1630::18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "forwards.mail.yandex.net", Issuer "Yandex CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7083C79618; Mon, 26 Jun 2017 13:28:19 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from smtp1m.mail.yandex.net (smtp1m.mail.yandex.net [77.88.61.132]) by forward5j.cmail.yandex.net (Yandex) with ESMTP id 5E92420EAC; Mon, 26 Jun 2017 16:28:08 +0300 (MSK) Received: from smtp1m.mail.yandex.net (localhost.localdomain [127.0.0.1]) by smtp1m.mail.yandex.net (Yandex) with ESMTP id D32AA63C0F61; Mon, 26 Jun 2017 16:28:06 +0300 (MSK) Received: by smtp1m.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id TNF1vv7VYM-S5X4233p; Mon, 26 Jun 2017 16:28:05 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1498483685; bh=YDJYvliDEulS1o4liFPL19zv/iJDJX1hqtF0z7hzcBA=; h=Subject:To:Cc:References:From:Message-ID:Date:In-Reply-To; b=KH8C6X7XlFLqX8TF3QKLlzQVbr1p0hTT5y6ZHFWqaJLBCwhNB5pWdlLdQKHaawmOB IIdtueBj+6DvneXkhtuT9bZiMQFC8ZiNWiL9ztALChnNWb2zB+3+IIdZPYuTx+s9PH yCI5N2LvrRusuR3FLB9OcL8R6T3xZkfmglhx3HuU= Authentication-Results: smtp1m.mail.yandex.net; dkim=pass header.i=@yandex.ru X-Yandex-Suid-Status: 1 0,1 0,1 0,1 0 Subject: Re: mbuf_jumbo_9k & iSCSI failing To: Ben RUBSON Cc: Ryan Stone , FreeBSD Net , "freebsd-scsi@freebsd.org" References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> From: "Andrey V. Elsukov" Openpgp: id=E6591E1B41DA1516F0C9BC0001C5EA0410C8A17A Message-ID: Date: Mon, 26 Jun 2017 16:25:28 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.0.1 MIME-Version: 1.0 In-Reply-To: <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="9jWKaCsF82R6ID2MvOwPFP64eGWl7m89A" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 13:28:19 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --9jWKaCsF82R6ID2MvOwPFP64eGWl7m89A Content-Type: multipart/mixed; boundary="xqTNpa3hGbXomXjTul6pEVMVTJWshQCR8"; protected-headers="v1" From: "Andrey V. Elsukov" To: Ben RUBSON Cc: Ryan Stone , FreeBSD Net , "freebsd-scsi@freebsd.org" Message-ID: Subject: Re: mbuf_jumbo_9k & iSCSI failing References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> In-Reply-To: <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> --xqTNpa3hGbXomXjTul6pEVMVTJWshQCR8 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 26.06.2017 16:27, Ben RUBSON wrote: >=20 >> On 26 Jun 2017, at 15:13, Andrey V. Elsukov wrote:= >> >> I think it is not mlxen specific problem, we have the same symptoms wi= th >> ixgbe(4) driver too. To avoid the problem we have patches that are >> disable using of 9k mbufs, and instead only use 4k mbufs. >=20 > Interesting feedback Andrey, thank you ! > The problem may be then "general". > So you still use large MTU (>=3D9000) but only allocating 4k mbufs, as = a workaround ? Yes. --=20 WBR, Andrey V. Elsukov --xqTNpa3hGbXomXjTul6pEVMVTJWshQCR8-- --9jWKaCsF82R6ID2MvOwPFP64eGWl7m89A Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEzBAEBCAAdFiEE5lkeG0HaFRbwybwAAcXqBBDIoXoFAllRC0gACgkQAcXqBBDI oXolKAf/VzLKbEtViQOs0S8iDBHHtg1+nTeDCtQCWdLq+dyLhwPWcDrHiSJ6xr1m Xvt4nuuXb1GFcLNBU/ewO5Kg2qM4qbnPN/k3OdeSdBiYfFKH1GwBVmwvDOkVhb0E gCoj4ambpXb18DElABEOIl/4xta91W4zpu+CGPPYG9QfsnMXwDh8xBaOtus/Ktar AKlUDo4rE9UaCDYuvVAa0bqSgHbgJ9B/xJKzGppNwDtFt/B364WByEbL5yDIvkR1 hrOVVi1qp7P0fLcCboD91elipEmZL4uDoVDZOFL/kj+yfnVuPtp2PqAkCCxyz3Rw UyJnnUZTBB4fDHhrIK4obakxpLgkQQ== =KQil -----END PGP SIGNATURE----- --9jWKaCsF82R6ID2MvOwPFP64eGWl7m89A-- From owner-freebsd-scsi@freebsd.org Mon Jun 26 13:29:51 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A0137D8725B; Mon, 26 Jun 2017 13:29:51 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com [IPv6:2a00:1450:400c:c09::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2C1357972D; Mon, 26 Jun 2017 13:29:51 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x22d.google.com with SMTP id 62so1366155wmw.1; Mon, 26 Jun 2017 06:29:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=6rcpEB/MMsV7jfh73S50iuteAuxdvGYhwUPWXJNu33k=; b=eJN1C5yk9Fu2CBWogpToPkxhW5Sl48pHpqP6qHNQTN3qfN4TCWk6JEodUiWrxp79sE kL8VPp6TFij8yyZ9JZ833hTb7BW9aKpA98NcggQAsIXKbaZ8D/V+MeZGBiBG7OU47pjU iqXwt+hGXWMGWvsXVB8NPbWjHUN0ntqWnC+D4Xz/HfNam8Ba4Vu8fxg6AcCcANm4GgVz kQwRqOCIchs1GtiC8Ys/PA7RA2ropciPWplOH2Zei7Ch1GgGZH7i26jIpoJCPoabAT30 EWGoH7ADYbL4cJy//b+pjeOeA+YY1QU2yND87m8G9Y2ExrQi3kAfBSJr2I5P5/xxA3Oi Fbjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=6rcpEB/MMsV7jfh73S50iuteAuxdvGYhwUPWXJNu33k=; b=A+eZ0gG+uNWQ77/hSN1MB2mjmjDxZOtdml6NHKN2lY6g9ECyoQ9ghtuU3By9y/PiDU aawnILIJneVcIyIcxlT2YN6vDnHqSQMgjbMScOYZvTFN7XIQR3T+3AFrCAMQpFQvY1aE Wpqnn8tf8CA9yQ4abheGRy7Z6RzPU8SobPSBXocJwQjNZtnWklE7GeMMmTdiiqNAa8S4 zGCYyoWTfx0BXvPtG9fnHtm9NH1HPG4xOxGMMM9LGhPHlK3tDSR+MiYUqnG5lmaQkYsm nlk1SYOEZHqulwZ7yLg8BA37F/HDjcij2JePiXgThul1AUN4UeAG81ZvCoHN24q5YImW LIfA== X-Gm-Message-State: AKS2vOy9aZQHgtnFMdDDDtyT6wXjYFUhw58uDh5TPiAmlqmVIgZjWFp5 fxEQ1uII/ryOPw== X-Received: by 10.28.136.4 with SMTP id k4mr117790wmd.4.1498483789545; Mon, 26 Jun 2017 06:29:49 -0700 (PDT) Received: from ben.home (LFbn-1-7159-4.w90-116.abo.wanadoo.fr. [90.116.90.4]) by smtp.gmail.com with ESMTPSA id x71sm109771wmd.32.2017.06.26.06.29.48 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 26 Jun 2017 06:29:49 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: mbuf_jumbo_9k & iSCSI failing From: Ben RUBSON In-Reply-To: Date: Mon, 26 Jun 2017 15:29:48 +0200 Cc: Ryan Stone , FreeBSD Net , "freebsd-scsi@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> To: "Andrey V. Elsukov" X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 13:29:51 -0000 > On 26 Jun 2017, at 15:25, Andrey V. Elsukov wrote: >=20 > On 26.06.2017 16:27, Ben RUBSON wrote: >>=20 >>> On 26 Jun 2017, at 15:13, Andrey V. Elsukov = wrote: >>>=20 >>> I think it is not mlxen specific problem, we have the same symptoms = with >>> ixgbe(4) driver too. To avoid the problem we have patches that are >>> disable using of 9k mbufs, and instead only use 4k mbufs. >>=20 >> Interesting feedback Andrey, thank you ! >> The problem may be then "general". >> So you still use large MTU (>=3D9000) but only allocating 4k mbufs, = as a workaround ? >=20 > Yes. Is it a kernel patch or a driver/ixgbe patch ?= From owner-freebsd-scsi@freebsd.org Mon Jun 26 13:39:28 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A671BD8782E; Mon, 26 Jun 2017 13:39:28 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from forward4h.cmail.yandex.net (forward4h.cmail.yandex.net [IPv6:2a02:6b8:0:f35::111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "forwards.mail.yandex.net", Issuer "Yandex CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 438B279CF3; Mon, 26 Jun 2017 13:39:28 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from smtp4o.mail.yandex.net (smtp4o.mail.yandex.net [37.140.190.29]) by forward4h.cmail.yandex.net (Yandex) with ESMTP id 191F620C8F; Mon, 26 Jun 2017 16:39:16 +0300 (MSK) Received: from smtp4o.mail.yandex.net (localhost.localdomain [127.0.0.1]) by smtp4o.mail.yandex.net (Yandex) with ESMTP id 4E9B86C00CA3; Mon, 26 Jun 2017 16:39:13 +0300 (MSK) Received: by smtp4o.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id hhLmE7UlhM-dCiKoYsj; Mon, 26 Jun 2017 16:39:13 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1498484353; bh=uno0HOfcwwJLZsdy+hh8yndMU0a7DYP/+WoPL12EUFY=; h=Subject:To:Cc:References:From:Message-ID:Date:In-Reply-To; b=IgN6CcT3wUKfDbZqWzp+XdS2UczF77h8cMSVnVmDvMHRzaIW3xVBwVSxI7CT7TjUw Pr8XHnN3bg+N2nr1mZQm8uReMx9yliFna0quXhLcDxbUa4/k01ub2AhVOisIuN5LbJ 6gmn8lIMqk9ICFg9X7F7KwqSVN6LhiMYNan6CNaU= Authentication-Results: smtp4o.mail.yandex.net; dkim=pass header.i=@yandex.ru X-Yandex-Suid-Status: 1 0,1 0,1 0,1 0 Subject: Re: mbuf_jumbo_9k & iSCSI failing To: Ben RUBSON Cc: Ryan Stone , FreeBSD Net , "freebsd-scsi@freebsd.org" References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> From: "Andrey V. Elsukov" Openpgp: id=E6591E1B41DA1516F0C9BC0001C5EA0410C8A17A Message-ID: <61f98b7d-f55d-aa0f-4aef-1bdfbc7086ff@yandex.ru> Date: Mon, 26 Jun 2017 16:36:35 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.0.1 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="8VTMwHwj940cm1xRx4AhDLWSOS98RD7OQ" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 13:39:28 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --8VTMwHwj940cm1xRx4AhDLWSOS98RD7OQ Content-Type: multipart/mixed; boundary="g9A9E4Di6C80mHn9j0G28L5VKTWTg18l6"; protected-headers="v1" From: "Andrey V. Elsukov" To: Ben RUBSON Cc: Ryan Stone , FreeBSD Net , "freebsd-scsi@freebsd.org" Message-ID: <61f98b7d-f55d-aa0f-4aef-1bdfbc7086ff@yandex.ru> Subject: Re: mbuf_jumbo_9k & iSCSI failing References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> In-Reply-To: --g9A9E4Di6C80mHn9j0G28L5VKTWTg18l6 Content-Type: multipart/mixed; boundary="------------FF1D5726D14DBD643DE462AD" Content-Language: en-US This is a multi-part message in MIME format. --------------FF1D5726D14DBD643DE462AD Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 26.06.2017 16:29, Ben RUBSON wrote: >=20 >> On 26 Jun 2017, at 15:25, Andrey V. Elsukov wrote:= >> >> On 26.06.2017 16:27, Ben RUBSON wrote: >>> >>>> On 26 Jun 2017, at 15:13, Andrey V. Elsukov wrot= e: >>>> >>>> I think it is not mlxen specific problem, we have the same symptoms = with >>>> ixgbe(4) driver too. To avoid the problem we have patches that are >>>> disable using of 9k mbufs, and instead only use 4k mbufs. >>> >>> Interesting feedback Andrey, thank you ! >>> The problem may be then "general". >>> So you still use large MTU (>=3D9000) but only allocating 4k mbufs, a= s a workaround ? >> >> Yes. >=20 > Is it a kernel patch or a driver/ixgbe patch ? I attached it. --=20 WBR, Andrey V. Elsukov --------------FF1D5726D14DBD643DE462AD Content-Type: text/x-patch; name="0004-Add-m_preferredsize-and-use-it-in-all-intel-s-driver.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename*0="0004-Add-m_preferredsize-and-use-it-in-all-intel-s-driver.pa"; filename*1="tch" =46rom 57b4789b7f6699a581ded2f4e07c7b12737af1e7 Mon Sep 17 00:00:00 2001 From: "Andrey V. Elsukov" Date: Thu, 6 Oct 2016 14:56:37 +0300 Subject: [PATCH 04/65] Add m_preferredsize() and use it in all intel's drivers. --- sys/dev/e1000/if_em.c | 7 +------ sys/dev/e1000/if_igb.c | 7 +------ sys/dev/ixgbe/if_ix.c | 5 +---- sys/dev/ixgbe/if_ixv.c | 5 +---- sys/dev/ixl/if_ixlv.c | 5 +---- sys/dev/ixl/ixl_pf_main.c | 5 +---- sys/kern/kern_mbuf.c | 35 +++++++++++++++++++++++++++++++++++ sys/sys/mbuf.h | 1 + 8 files changed, 42 insertions(+), 28 deletions(-) diff --git a/sys/dev/e1000/if_em.c b/sys/dev/e1000/if_em.c index 7e2690eae08..1af66b7c519 100644 --- a/sys/dev/e1000/if_em.c +++ b/sys/dev/e1000/if_em.c @@ -1421,12 +1421,7 @@ em_init_locked(struct adapter *adapter) ** Figure out the desired mbuf ** pool for doing jumbos */ - if (adapter->hw.mac.max_frame_size <=3D 2048) - adapter->rx_mbuf_sz =3D MCLBYTES; - else if (adapter->hw.mac.max_frame_size <=3D 4096) - adapter->rx_mbuf_sz =3D MJUMPAGESIZE; - else - adapter->rx_mbuf_sz =3D MJUM9BYTES; + adapter->rx_mbuf_sz =3D m_preferredsize(adapter->hw.mac.max_frame_size)= ; =20 /* Prepare receive descriptors and buffers */ if (em_setup_receive_structures(adapter)) { diff --git a/sys/dev/e1000/if_igb.c b/sys/dev/e1000/if_igb.c index 8e018995029..bfaecae1f71 100644 --- a/sys/dev/e1000/if_igb.c +++ b/sys/dev/e1000/if_igb.c @@ -1325,12 +1325,7 @@ igb_init_locked(struct adapter *adapter) ** Figure out the desired mbuf pool ** for doing jumbo/packetsplit */ - if (adapter->max_frame_size <=3D 2048) - adapter->rx_mbuf_sz =3D MCLBYTES; - else if (adapter->max_frame_size <=3D 4096) - adapter->rx_mbuf_sz =3D MJUMPAGESIZE; - else - adapter->rx_mbuf_sz =3D MJUM9BYTES; + adapter->rx_mbuf_sz =3D m_preferredsize(adapter->max_frame_size); =20 /* Prepare receive descriptors and buffers */ if (igb_setup_receive_structures(adapter)) { diff --git a/sys/dev/ixgbe/if_ix.c b/sys/dev/ixgbe/if_ix.c index cf2231dc8fc..26fce2704ba 100644 --- a/sys/dev/ixgbe/if_ix.c +++ b/sys/dev/ixgbe/if_ix.c @@ -1118,10 +1118,7 @@ ixgbe_init_locked(struct adapter *adapter) ixgbe_set_multi(adapter); =20 /* Determine the correct mbuf pool, based on frame size */ - if (adapter->max_frame_size <=3D MCLBYTES) - adapter->rx_mbuf_sz =3D MCLBYTES; - else - adapter->rx_mbuf_sz =3D MJUMPAGESIZE; + adapter->rx_mbuf_sz =3D m_preferredsize(adapter->max_frame_size); =20 /* Prepare receive descriptors and buffers */ if (ixgbe_setup_receive_structures(adapter)) { diff --git a/sys/dev/ixgbe/if_ixv.c b/sys/dev/ixgbe/if_ixv.c index 80fb1b34be3..5062affb779 100644 --- a/sys/dev/ixgbe/if_ixv.c +++ b/sys/dev/ixgbe/if_ixv.c @@ -698,10 +698,7 @@ ixv_init_locked(struct adapter *adapter) ** Determine the correct mbuf pool ** for doing jumbo/headersplit */ - if (ifp->if_mtu > ETHERMTU) - adapter->rx_mbuf_sz =3D MJUMPAGESIZE; - else - adapter->rx_mbuf_sz =3D MCLBYTES; + adapter->rx_mbuf_sz =3D m_preferredsize(ifp->if_mtu); =20 /* Prepare receive descriptors and buffers */ if (ixgbe_setup_receive_structures(adapter)) { diff --git a/sys/dev/ixl/if_ixlv.c b/sys/dev/ixl/if_ixlv.c index c447c34689e..608d784bfee 100644 --- a/sys/dev/ixl/if_ixlv.c +++ b/sys/dev/ixl/if_ixlv.c @@ -904,10 +904,7 @@ ixlv_init_locked(struct ixlv_sc *sc) =20 ixl_init_tx_ring(que); =20 - if (vsi->max_frame_size <=3D MCLBYTES) - rxr->mbuf_sz =3D MCLBYTES; - else - rxr->mbuf_sz =3D MJUMPAGESIZE; + rxr->mbuf_sz =3D m_preferredsize(vsi->max_frame_size); ixl_init_rx_ring(que); } =20 diff --git a/sys/dev/ixl/ixl_pf_main.c b/sys/dev/ixl/ixl_pf_main.c index d8da4cfee10..8600b0f931e 100644 --- a/sys/dev/ixl/ixl_pf_main.c +++ b/sys/dev/ixl/ixl_pf_main.c @@ -2067,10 +2067,7 @@ ixl_initialize_vsi(struct ixl_vsi *vsi) ixl_init_tx_ring(que); =20 /* Next setup the HMC RX Context */ - if (vsi->max_frame_size <=3D MCLBYTES) - rxr->mbuf_sz =3D MCLBYTES; - else - rxr->mbuf_sz =3D MJUMPAGESIZE; + rxr->mbuf_sz =3D m_preferredsize(vsi->max_frame_size); =20 u16 max_rxmax =3D rxr->mbuf_sz * hw->func_caps.rx_buf_chain_len; =20 diff --git a/sys/kern/kern_mbuf.c b/sys/kern/kern_mbuf.c index 0d0c1c86b16..7c10cedb075 100644 --- a/sys/kern/kern_mbuf.c +++ b/sys/kern/kern_mbuf.c @@ -103,6 +103,10 @@ int nmbjumbop; /* limits number of page size jumbo= clusters */ int nmbjumbo9; /* limits number of 9k jumbo clusters */ int nmbjumbo16; /* limits number of 16k jumbo clusters */ =20 +static int nojumbobuf; /* Use MCLBYTES mbufs */ +static int nojumbo9buf; /* Use either MCLBYTES or MJUMPAGESIZE */ +static int nojumbo16buf; /* Use any mbuf size less than MJUM16BYTES */ + static quad_t maxmbufmem; /* overall real memory limit for all mbufs */ =20 SYSCTL_QUAD(_kern_ipc, OID_AUTO, maxmbufmem, CTLFLAG_RDTUN | CTLFLAG_NOF= ETCH, &maxmbufmem, 0, @@ -151,6 +155,16 @@ tunable_mbinit(void *dummy) if (nmbufs < nmbclusters + nmbjumbop + nmbjumbo9 + nmbjumbo16) nmbufs =3D lmax(maxmbufmem / MSIZE / 5, nmbclusters + nmbjumbop + nmbjumbo9 + nmbjumbo16); + /* + * Defaults to disable 9/16-kbyte pages + */ + nojumbobuf =3D 0; + nojumbo9buf =3D 1; + nojumbo16buf =3D 1; + + TUNABLE_INT_FETCH("kern.ipc.nojumbobuf", &nojumbobuf); + TUNABLE_INT_FETCH("kern.ipc.nojumbo9buf", &nojumbo9buf); + TUNABLE_INT_FETCH("kern.ipc.nojumbo16buf", &nojumbo16buf); } SYSINIT(tunable_mbinit, SI_SUB_KMEM, SI_ORDER_MIDDLE, tunable_mbinit, NU= LL); =20 @@ -261,6 +275,27 @@ SYSCTL_PROC(_kern_ipc, OID_AUTO, nmbufs, CTLTYPE_INT= |CTLFLAG_RW, "Maximum number of mbufs allowed"); =20 /* + * Determine the correct mbuf pool + * for given mtu size + */ +int +m_preferredsize(int mtu) +{ + int size; + + if (mtu <=3D 2048 || nojumbobuf !=3D 0) + size =3D MCLBYTES; + else if (mtu <=3D 4096 || nojumbo9buf !=3D 0) + size =3D MJUMPAGESIZE; + else if (mtu <=3D 9216 || nojumbo16buf !=3D 0) + size =3D MJUM9BYTES; + else + size =3D MJUM16BYTES; + + return (size); +} + +/* * Zones from which we allocate. */ uma_zone_t zone_mbuf; diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h index fdd9931515d..b6a81b05e3b 100644 --- a/sys/sys/mbuf.h +++ b/sys/sys/mbuf.h @@ -606,6 +606,7 @@ u_int m_length(struct mbuf *, struct mbuf **); int m_mbuftouio(struct uio *, struct mbuf *, int); void m_move_pkthdr(struct mbuf *, struct mbuf *); int m_pkthdr_init(struct mbuf *, int); +int m_preferredsize(int); struct mbuf *m_prepend(struct mbuf *, int, int); void m_print(const struct mbuf *, int); struct mbuf *m_pulldown(struct mbuf *, int, int, int *); --=20 2.12.1 --------------FF1D5726D14DBD643DE462AD-- --g9A9E4Di6C80mHn9j0G28L5VKTWTg18l6-- --8VTMwHwj940cm1xRx4AhDLWSOS98RD7OQ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEzBAEBCAAdFiEE5lkeG0HaFRbwybwAAcXqBBDIoXoFAllRDeMACgkQAcXqBBDI oXqqvAf6AwhLtFohbNT9kfkP6P0pMIGoXCYwQ7ACxYVLmTLzKorGGV4aj+DalTjv Dv7H2ICYXBcESgH8xjZgMKeAVxIfMlsvGVbRwQs3rnSO9bMjGLXsPxcD6ymvZf4L tgxm4aBPyPFevdBD6DdU7bdfv+Ml1c15iQ/Vr5khQaplMkcw2q0mUI6efJD6agp0 5fjd7kxrHDxranQr+DcW6lw+pd4GQakBLy5JNODTESMOc4DaUbhGVE79nRvMqTMU LjJAWxQGqvwRvvy3RkWuCczjuBelM0Cb9U8HxPbDzumyXbY113raLidGJbabnuul YPGe76RwAwFQiFkVaLiplL8pMkaAZw== =0JjL -----END PGP SIGNATURE----- --8VTMwHwj940cm1xRx4AhDLWSOS98RD7OQ-- From owner-freebsd-scsi@freebsd.org Mon Jun 26 13:57:51 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C6E12D87ED2; Mon, 26 Jun 2017 13:57:51 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wr0-x242.google.com (mail-wr0-x242.google.com [IPv6:2a00:1450:400c:c0c::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 520A17A71D; Mon, 26 Jun 2017 13:57:51 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wr0-x242.google.com with SMTP id 77so29375157wrb.3; Mon, 26 Jun 2017 06:57:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=BEEoNVE2CrKbx9gHAYaoAeYCLOeLKlHzIeLaw57IxhM=; b=UsYDHf2MKfhWumAmRebPCb6bCSggmRSDFnFLjuvilFmech2TywdD/ndov24EjhoymJ Z2Xx97kkP6A+660RgmcnK6bOF/Lf6JfFns6Ja7iq+72PLpdf6/s0hb2M+xfFpMy5xq06 p/SSkkuQFuwGoOEBFurEmKk2FpCHmmPH6oT9iovIDwColSEWQftxecL3BcSmutekcZzw MlCIxloPgnmHv6qCQBeiHNyUybf0uA1QzEu/DJXmu3gwiLMSRHHAfEg59psgYMWytE7E aTRhzAte7rFHY6zG+L+RI3FWJ/uA+Z/Ow2aF4+CNyMhFWel2kRhCSLA9q4AU6JfSR0Mk KX/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=BEEoNVE2CrKbx9gHAYaoAeYCLOeLKlHzIeLaw57IxhM=; b=EPOM7iZ+mG1ijkPCVmnie3cdGRmNCjdIDB7/RhaLEC7EPf+Ft9cCAYb42/Q/hV7nOu 6KSmTAcu210B1VyQoWEilui8BNa1ZWvvDl1z7u9AW9pSHOoPQn8/Q7oWA5mH3n7hsdaW IyygC2ElKuo+wtVMSbBAa3jX6pIbaa67C4o3yDbx+lL9q1qeCQ3NhJ9VRQlcdTvtGCxR nYGGK9fwdUGOJ+io9s9a/4llFsUHUizmN3bdptazJWCiePFDqSwwrmjb6iEP+kvJMZQD kgH5MPTFPNgyRzyhaoxtpwMShGkXNLXbrrSwHOfpcAjGHEZ+nTa/o99GDKSM2YXuJAtQ lAgw== X-Gm-Message-State: AKS2vOxJrNXYjufhTjxToBVUxgc+b5YxWwlOfkrsJs+hdAs3CzlYj8wc Cuxfkljmx/0vdw== X-Received: by 10.223.151.51 with SMTP id r48mr14803383wrb.189.1498485469753; Mon, 26 Jun 2017 06:57:49 -0700 (PDT) Received: from ben.home (LFbn-1-7159-4.w90-116.abo.wanadoo.fr. [90.116.90.4]) by smtp.gmail.com with ESMTPSA id p87sm209778wma.2.2017.06.26.06.57.48 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 26 Jun 2017 06:57:49 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: mbuf_jumbo_9k & iSCSI failing From: Ben RUBSON In-Reply-To: <61f98b7d-f55d-aa0f-4aef-1bdfbc7086ff@yandex.ru> Date: Mon, 26 Jun 2017 15:57:48 +0200 Cc: Ryan Stone , FreeBSD Net , "freebsd-scsi@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <2C291A70-B6DD-4E21-9106-4FE023E9EAFE@gmail.com> References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> <61f98b7d-f55d-aa0f-4aef-1bdfbc7086ff@yandex.ru> To: "Andrey V. Elsukov" X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 13:57:51 -0000 > On 26 Jun 2017, at 15:36, Andrey V. Elsukov wrote: >=20 > On 26.06.2017 16:29, Ben RUBSON wrote: >>=20 >>> On 26 Jun 2017, at 15:25, Andrey V. Elsukov = wrote: >>>=20 >>> On 26.06.2017 16:27, Ben RUBSON wrote: >>>>=20 >>>>> On 26 Jun 2017, at 15:13, Andrey V. Elsukov = wrote: >>>>>=20 >>>>> I think it is not mlxen specific problem, we have the same = symptoms with >>>>> ixgbe(4) driver too. To avoid the problem we have patches that are >>>>> disable using of 9k mbufs, and instead only use 4k mbufs. >>>>=20 >>>> Interesting feedback Andrey, thank you ! >>>> The problem may be then "general". >>>> So you still use large MTU (>=3D9000) but only allocating 4k mbufs, = as a workaround ? >>>=20 >>> Yes. >>=20 >> Is it a kernel patch or a driver/ixgbe patch ? >=20 > I attached it. Thank you ! The idea of new sysctls to enable/disable the workaround is nice. Should be easy to modify to use with mlx4_en, waiting for Ryan specific = work on this driver.=20 I found a similar issue, reported date : 2013-10-28 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D183381 FreeBSD certainly needs a general solid patch != From owner-freebsd-scsi@freebsd.org Mon Jun 26 14:00:54 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 25C62D87FCC for ; Mon, 26 Jun 2017 14:00:54 +0000 (UTC) (envelope-from julien@perdition.city) Received: from relay-b02.edpnet.be (relay-b02.edpnet.be [212.71.1.222]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "edpnet.email", Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C425C7A893 for ; Mon, 26 Jun 2017 14:00:53 +0000 (UTC) (envelope-from julien@perdition.city) X-ASG-Debug-ID: 1498484698-0a7b8d16e45f9920001-NzfR5x Received: from mordor.lan (77.109.96.171.adsl.dyn.edpnet.net [77.109.96.171]) by relay-b02.edpnet.be with ESMTP id iQJ7RsBX2SeU6S2e (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 26 Jun 2017 15:44:59 +0200 (CEST) X-Barracuda-Envelope-From: julien@perdition.city X-Barracuda-Effective-Source-IP: 77.109.96.171.adsl.dyn.edpnet.net[77.109.96.171] X-Barracuda-Apparent-Source-IP: 77.109.96.171 Date: Mon, 26 Jun 2017 15:44:58 +0200 From: Julien Cigar To: "Andrey V. Elsukov" Cc: Ryan Stone , Ben RUBSON , FreeBSD Net , "freebsd-scsi@freebsd.org" Subject: Re: mbuf_jumbo_9k & iSCSI failing Message-ID: <20170626134458.GT43966@mordor.lan> X-ASG-Orig-Subj: Re: mbuf_jumbo_9k & iSCSI failing References: <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="TgCXP+xznsSrEyty" Content-Disposition: inline In-Reply-To: <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> User-Agent: Mutt/1.8.2 (2017-04-18) X-Barracuda-Connect: 77.109.96.171.adsl.dyn.edpnet.net[77.109.96.171] X-Barracuda-Start-Time: 1498484698 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://212.71.1.222:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at edpnet.be X-Barracuda-Scan-Msg-Size: 1814 X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.4999 1.0000 0.0000 X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=6.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.40248 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 14:00:54 -0000 --TgCXP+xznsSrEyty Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jun 26, 2017 at 04:13:33PM +0300, Andrey V. Elsukov wrote: > On 25.06.2017 18:32, Ryan Stone wrote: > > Having looking at the original email more closely, I see that you showe= d an > > mlxen interface with a 9020 MTU. Seeing allocation failures of 9k mbuf > > clusters increase while you are far below the zone's limit means that > > you're definitely running into the bug I'm describing, and this bug cou= ld > > plausibly cause the iSCSI errors that you describe. > >=20 > > The issue is that the newer version of the driver tries to allocate a > > single buffer to accommodate an MTU-sized packet. Over time, however, > > memory will become fragmented and eventually it can become impossible to > > allocate a 9k physically contiguous buffer. When this happens the driv= er > > is unable to allocate buffers to receive packets and is forced to drop > > them. Presumably, if iSCSI suffers too many packet drops it will termi= nate > > the connection. The older version of the driver limited itself to > > page-sized buffers, so it was immune to issues with memory fragmentatio= n. >=20 > I think it is not mlxen specific problem, we have the same symptoms with > ixgbe(4) driver too. To avoid the problem we have patches that are > disable using of 9k mbufs, and instead only use 4k mbufs. I had the same issue on a lightly loaded HP DL20 machine (BCM5720=20 chipsets), 8GB of RAM, running 10.3. Problem usually happens within 30 days with 9k jumbo clusters allocation failure. >=20 > --=20 > WBR, Andrey V. Elsukov >=20 --=20 Julien Cigar Belgian Biodiversity Platform (http://www.biodiversity.be) PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. --TgCXP+xznsSrEyty Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE7vn2l0to0nV7EWolsrs3EKIEI8AFAllRD9cACgkQsrs3EKIE I8Dj9w//XKLEOkjmTdf9HKiS5Dhe7nLJyFP5RFVXqSU4gx9b6oxh9jHKbBU+10iK v+yxFye/LtQKDabaOvwiGhMhhcYt2zVLWAKecpPwSxFE4KisW1KUw+PkzMXVoEP9 d9MXC3iPbTG3vNBFqZtK+VaXNQc3BZ6ZhgGMjO85Mbn2bp841kEtJROfPLvpYPfF yDcCZOAvD/ALzTprxRWzFZbRCl7TLJRUFCLHIGScm4B/QXbzdb/uqw9U265DQneO cDvig+wddfrC8DI6nhUhkv/o6CvN4pqIm66UZRCGyfni12MnMfXPyKP+ohrZPPGR btrUAzb0lBBM6E1Vmpi37IHERKR22wRsUkB52//ffJwHmmziR8ytM6Rns3V9xgkJ Qf0+PidI5fsqltf47IM47iXgwT04+FSWvZ+aUOv67nRPttAQdoZehXodU/ECnb22 jRMTIISu6p9Jo6ihwoMqFXGqJdOHAumLLu4uzXpVTcgSa6Qk6ei868PxBLo5EoEJ o2MXuz97EnlAfhfpo9zE0uzHnqao8QOrdQgz8CEBv3+0lozuTYGO5hgx+4O8dpl4 531pADsW6zlqTbCfTNJAEZeaGGG/fI8YmosQ4G1zdXL793O21QOwlkk7KXV9HX8U 9M179eg5NssUB2tBTXXqYAdo8iZf/c3staTo5DWj8qEtn78GIVk= =erJV -----END PGP SIGNATURE----- --TgCXP+xznsSrEyty-- From owner-freebsd-scsi@freebsd.org Mon Jun 26 15:11:02 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D5667D89845; Mon, 26 Jun 2017 15:11:02 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x229.google.com (mail-wm0-x229.google.com [IPv6:2a00:1450:400c:c09::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 653417CA2D; Mon, 26 Jun 2017 15:11:02 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x229.google.com with SMTP id b184so435902wme.1; Mon, 26 Jun 2017 08:11:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=2Ivhmt0ZiFxkr4AXYgDELGiGv49zFBsBaKqq+F1OS/g=; b=jjHwTPfeZ7TadFZLAtQiax+48/4+0QTOMje597q/oH2bBVr8dZ9Y1lwvOr+2N1WefQ vKEqxqv+gmT/GDZ8jGOXr+derv5yRRqtHHmqdTVdz0DRLcXLCIiiJixXXdQ3Ex+78gnE XiuvrhyWjURYtpV8HCY0eZarDG53bGiVzELYVZ1DoPB5wkQCxxPKlhnwU07u5LPBUPkA qs13S2pupiYP0dankaTs2KqVLzQgD0RiJLAEcy+zLHQ+s2UCSszwlGla9oFrNKGqOdFj EMvvi1WAYCVje3vx7VoVnk537pwOyOG1j+v2yWS4Ibxo9gH4YauT/6/GcrnpoKVdWnEO peGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=2Ivhmt0ZiFxkr4AXYgDELGiGv49zFBsBaKqq+F1OS/g=; b=t9OZCavu5QPsdclyvK94k/+idtKcJilPvGcxREAIYoErLMmh6lDt3spkoEH3Gbkchk QZBokODijZl9MoOi0sPNrVcQtasWZfsNsQaeCVikpVsYBK6VsaGJXLBIDNm3KvPiTMB+ MfefBCCyolgOC9WTMdwBwD+/X9jps3/B+PBQpAZVvZ5puj2I8bDyzNeRiFX+/0XXk55u 0cMbctWgAIkbIoJF2t0X5PFfgPc+PBvIclZiCQGnGe5DF77Lf5R71XjzYyYUBJbfJEM/ BAXdXM49BXNBVvQwB8eFdcRvgKClnvYz9JqOPXu0Sik6qZvN0vboJPJZw+MLo+ht/M35 nsFw== X-Gm-Message-State: AKS2vOwaAQveGH/nSTTkcTCDGz1EsKk/mUKZuZWfJIufjXi9OrDEroDj Kv/n/IFZWUNQTw== X-Received: by 10.28.130.196 with SMTP id e187mr56813wmd.24.1498489860792; Mon, 26 Jun 2017 08:11:00 -0700 (PDT) Received: from ben.home (LFbn-1-7159-4.w90-116.abo.wanadoo.fr. [90.116.90.4]) by smtp.gmail.com with ESMTPSA id m26sm19542171wrm.4.2017.06.26.08.10.59 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 26 Jun 2017 08:11:00 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: mbuf_jumbo_9k & iSCSI failing From: Ben RUBSON In-Reply-To: <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> Date: Mon, 26 Jun 2017 17:10:59 +0200 Cc: Ryan Stone , FreeBSD Net , "freebsd-scsi@freebsd.org" Content-Transfer-Encoding: 7bit Message-Id: <8CBA6288-BEB4-4301-8DAE-058B2348F909@gmail.com> References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> To: "Andrey V. Elsukov" X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 15:11:02 -0000 > On 26 Jun 2017, at 15:13, Andrey V. Elsukov wrote: > > I think it is not mlxen specific problem, we have the same symptoms with > ixgbe(4) driver too. To avoid the problem we have patches that are > disable using of 9k mbufs, and instead only use 4k mbufs. Another workaround is to decrease the MTU until 9K mbufs are not more used. On my systems it gives a 4072 bytes MTU. It solved the issue without having to reboot. Of course it's just a workaround, as decreasing MTU increases overhead... From owner-freebsd-scsi@freebsd.org Mon Jun 26 15:24:06 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97A03D89CAC for ; Mon, 26 Jun 2017 15:24:06 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 857F47D1A1 for ; Mon, 26 Jun 2017 15:24:06 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v5QFO5Fj044058 for ; Mon, 26 Jun 2017 15:24:06 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 220094] [scsi] sys/cam/scsi/scsi_sa.c: a sleep-under-mutex bug in saioctl Date: Mon, 26 Jun 2017 15:24:05 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: commit-hook@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: ken@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 15:24:06 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D220094 --- Comment #3 from commit-hook@freebsd.org --- A commit references this bug: Author: ken Date: Mon Jun 26 15:23:12 UTC 2017 New revision: 320361 URL: https://svnweb.freebsd.org/changeset/base/320361 Log: MFC r320123: Fix a potential sleep while holding a mutex in the sa(4) driver. If the user issues a MTIOCEXTGET ioctl, and the tape drive in question = has a serial number that is longer than 80 characters, we malloc a buffer in saextget() to hold the output of cam_strvis(). Since a mutex is held in that codepath, doing a M_WAITOK malloc could l= ead to sleeping while holding a mutex. Change it to a M_NOWAIT malloc and = bail out if we fail to allocate the memory. Devices with serial numbers lon= ger than 80 bytes are very rare (I don't recall seeing one), so this should be a very unusual case to hit. But it is a bug that should be fixed. sys/cam/scsi/scsi_sa.c: In saextget(), if we need to malloc a buffer to hold the output of cam_strvis(), don't wait for the memory. Fail and return an error if we can't allocate the memory immediately. PR: kern/220094 Submitted by: Jia-Ju Bai Sponsored by: Spectra Logic Changes: _U stable/10/ stable/10/sys/cam/scsi/scsi_sa.c --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-scsi@freebsd.org Mon Jun 26 15:35:44 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C8E1CD8A170 for ; Mon, 26 Jun 2017 15:35:44 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-io0-x229.google.com (mail-io0-x229.google.com [IPv6:2607:f8b0:4001:c06::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 996027D8FE for ; Mon, 26 Jun 2017 15:35:44 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-io0-x229.google.com with SMTP id r36so2954405ioi.1 for ; Mon, 26 Jun 2017 08:35:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to; bh=L0lUz1scRktFOSTOGemQb72T/BUoxM9c1wkOQ440LAw=; b=A2bYRN/wMpj8BmJRZHYILMQVzCHl88PBAWA/aESnoSlJKk+WOhm+Fh6qXoy2t+YM3E u63hXu1OtvX5IKCEjKUAKuJUEOt4ICmMpF0Izkmlt5kK4sAJVVYS1cOlo2T52GIhItM9 BygWdjgmUZXLTVAtkbxf1XAfaiyWeodvhT0Vc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to; bh=L0lUz1scRktFOSTOGemQb72T/BUoxM9c1wkOQ440LAw=; b=a4BLbrBZnamjCLPY4S8KJ+ggk4/YSXYRxX651O7oiGUN36figN+RLX7A4UZJHkHo92 DwNjI3L8YfhrZFiUH4ARWFqbXbtT5ZCN8eOrdjG5wth+cxhoMzAmcBvr9Zkz1WbOKidV 5FImxskLD6JEEo5ssVqHKY7DPi0OF9MzVWH9UaMsz9rMN3HwJbxDnXUdK2oSlU9jytD+ MgtKcTXk3giv9835Z+B0T/lS8dCqN1VSphI8tLHBRb1R6TasZtWrMpFmfcaBHcaR+a7+ GUnj4TnUH7MXJWt8P6PmrRjfiWrGPln9lNhPYXDHeWWMK3eZI+750I7jBi7r6jD2y/Es s6Jw== X-Gm-Message-State: AKS2vOxe26deFZp6zMuM0dgJLW/5cG2TEXVBwmF69ESgvVLACrPwJTf2 AsOIFLjrO39ivQAIvb0/JcL1DFJMuUAsRyc= X-Received: by 10.107.180.5 with SMTP id d5mr981106iof.56.1498491343165; Mon, 26 Jun 2017 08:35:43 -0700 (PDT) From: Stephen Mcconnell References: <592FDE8C.1090609@omnilan.de> <59303484.1040609@omnilan.de> <59306503.4010007@omnilan.de> <59315A74.9050506@omnilan.de> <20170602153705.GA56018@mithlond.kdm.org> <593198C3.2080902@omnilan.de> <59319917.1050301@omnilan.de> <86a38661813a20d3b349920c2de8962e@mail.gmail.com> In-Reply-To: MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQK5uw9AxlTbZs3SRUL7gsvMDeNX4QK127o/Aqh6HjgB6++7AwFsXC9MAdL6oGEBmtNDxAIdfU1yAZuZqewB6HGuVwJh2b5Sn8htsVA= Date: Mon, 26 Jun 2017 09:35:41 -0600 Message-ID: <4c93e12f2776d3c5d372b2652883d4af@mail.gmail.com> Subject: RE: mps(4) blocks panic-reboot To: "Eugene M. Zheganin" , freebsd-scsi@freebsd.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 15:35:44 -0000 The change I provided to Harry didn't work completely, and I haven't been able to find time to look into it further. Since it wasn't a complete fix I didn't commit it. I'm not sure when I'll get some time, but hopefully in the next week or two. Stephen > -----Original Message----- > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > scsi@freebsd.org] On Behalf Of Eugene M. Zheganin > Sent: Monday, June 26, 2017 12:20 AM > To: freebsd-scsi@freebsd.org > Subject: Re: mps(4) blocks panic-reboot > > Hi. > > On 02.06.2017 22:13, Stephen Mcconnell via freebsd-scsi wrote: > > Thanks Harry. I'll need to do some testing here to see if I can figure > > it out. > Guys, can I ask what is the status of this ? Was it commited/MFC'd ? > Because seems like I'm having the very same issue on a mpr(4). > Look: > > http://static.enaza.ru/userupload/gyazo/4da4b1c84c48bd592e676af46fff.png > > It happens at least twice a week. > > Eugene. > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Mon Jun 26 15:39:55 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0C211D8A21B; Mon, 26 Jun 2017 15:39:55 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pg0-x234.google.com (mail-pg0-x234.google.com [IPv6:2607:f8b0:400e:c05::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C82157D9B8; Mon, 26 Jun 2017 15:39:54 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: by mail-pg0-x234.google.com with SMTP id f127so2076525pgc.0; Mon, 26 Jun 2017 08:39:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-disposition:in-reply-to:user-agent; bh=zemzJsAmUs18+RZbRidECL9wvjOpoixw5Ky1RCz89tE=; b=X6KFe6WSmYuBIwEyrcxtZozacEGGagfWkG88xHGqPng0Cwc1kB/muLWxJkJeZ05UjS MXvmPS4qL/ZdwG2zvnytEX7NV5uCOLKrpvE7RUIKHWrRXbkYXijbR0EXbbd+9wYj/RJZ J3tu7QU7zniSBImIx9XhtbQQJhdEBc4eg5uzWyrwOtJ13UiBdLzkqsE2zuLF5Kz7Q057 A2ZEw+Or2d31ML1xOw2xr9Y+9SgBTiZV8kcIdkN0XRlRxADOXtbNNlxBjhh1atOBzIR+ hVI4OggtK2qQG+Hv5YCrHFZCHKXrhOPdrmSxABcHtqHqeIqZ1QXave59/lrYf/U1ZE92 J8/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:reply-to :references:mime-version:content-disposition:in-reply-to:user-agent; bh=zemzJsAmUs18+RZbRidECL9wvjOpoixw5Ky1RCz89tE=; b=sjWGiS+vNyvvMqiuKRzF7993mf73OyJZuWScxMWGQ/P8peltJozIEuqdf2ndBdHbb5 Yr0Hn5Uuw+BuIEblcVaxnjtBYOZ2nhgR6R36kOSknVi0T6ag61vOKYLyLmN9WdK6blV0 FBS4n6n5UoDyWeRYpQRS+aFvrmRIwA4lg3Uhw4yz772S+41EcFAsDMHlKUoZxz+VpMpp +5H8zqT3rQQfBhyWdGMilxGi7PBmpvokaMFs//BPZibKpjqvXI2LmPWR1X+5ko2MyaXO nbRfH1aWeTmaVHMNR+Jb5fCxvV4BTsWRMondcIN68JgGf6deFKv6Lm4bhghleZzAFY4G jDhw== X-Gm-Message-State: AKS2vOxXnA6lcQy+SCnoayLiOqKtSf/SeRwM+sSXV/tUDYda98XJYlBn xYOy8ofdkPXOPg== X-Received: by 10.84.128.69 with SMTP id 63mr835077pla.54.1498491594314; Mon, 26 Jun 2017 08:39:54 -0700 (PDT) Received: from localhost ([1.227.152.47]) by smtp.gmail.com with ESMTPSA id n2sm709617pgd.26.2017.06.26.08.39.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Jun 2017 08:39:53 -0700 (PDT) From: YongHyeon PYUN X-Google-Original-From: "YongHyeon PYUN" Received: by localhost (sSMTP sendmail emulation); Tue, 27 Jun 2017 00:40:10 +0900 Date: Tue, 27 Jun 2017 00:40:10 +0900 To: Julien Cigar Cc: "Andrey V. Elsukov" , FreeBSD Net , Ryan Stone , Ben RUBSON , "freebsd-scsi@freebsd.org" Subject: Re: mbuf_jumbo_9k & iSCSI failing Message-ID: <20170626154010.GA2488@michelle.fasterthan.co.kr> Reply-To: pyunyh@gmail.com References: <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> <20170626134458.GT43966@mordor.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170626134458.GT43966@mordor.lan> User-Agent: Mutt/1.4.2.3i X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 15:39:55 -0000 On Mon, Jun 26, 2017 at 03:44:58PM +0200, Julien Cigar wrote: > On Mon, Jun 26, 2017 at 04:13:33PM +0300, Andrey V. Elsukov wrote: > > On 25.06.2017 18:32, Ryan Stone wrote: > > > Having looking at the original email more closely, I see that you showed an > > > mlxen interface with a 9020 MTU. Seeing allocation failures of 9k mbuf > > > clusters increase while you are far below the zone's limit means that > > > you're definitely running into the bug I'm describing, and this bug could > > > plausibly cause the iSCSI errors that you describe. > > > > > > The issue is that the newer version of the driver tries to allocate a > > > single buffer to accommodate an MTU-sized packet. Over time, however, > > > memory will become fragmented and eventually it can become impossible to > > > allocate a 9k physically contiguous buffer. When this happens the driver > > > is unable to allocate buffers to receive packets and is forced to drop > > > them. Presumably, if iSCSI suffers too many packet drops it will terminate > > > the connection. The older version of the driver limited itself to > > > page-sized buffers, so it was immune to issues with memory fragmentation. > > > > I think it is not mlxen specific problem, we have the same symptoms with > > ixgbe(4) driver too. To avoid the problem we have patches that are > > disable using of 9k mbufs, and instead only use 4k mbufs. > > I had the same issue on a lightly loaded HP DL20 machine (BCM5720 > chipsets), 8GB of RAM, running 10.3. Problem usually happens > within 30 days with 9k jumbo clusters allocation failure. > This looks strange to me. If I recall correctly bge(4) does not request physically contiguous 9k jumbo buffers for BCM5720 so it wouldn't suffer from memory fragmentation. (It uses m_cljget() and takes advantage of extended RX BDs to handle up to 4 DMA segments). If your controller is either BCM5714/BCM5715 or BCM5780, it requires physically contiguous 9k jumbo buffers to handle jumbo frames though. From owner-freebsd-scsi@freebsd.org Mon Jun 26 16:26:22 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D2BDD8B61C; Mon, 26 Jun 2017 16:26:22 +0000 (UTC) (envelope-from matt.joras@gmail.com) Received: from mail-wr0-x229.google.com (mail-wr0-x229.google.com [IPv6:2a00:1450:400c:c0c::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C9BCB7F36C; Mon, 26 Jun 2017 16:26:21 +0000 (UTC) (envelope-from matt.joras@gmail.com) Received: by mail-wr0-x229.google.com with SMTP id 77so147270960wrb.1; Mon, 26 Jun 2017 09:26:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=L4JxaB/ZQ4GTAlGrgHDF4xJYCRU6c3BJAy3OSKIVAZg=; b=HtsofQLi2EZWMxV5XTjidUCq+khQLoNxqvR/s5+Rd2eDv52OwpTcIiAXIAGQJaZFfj RB+2a62ZmjlO+dhC6xnX0kVI39uhc+pY8xb96rhtCjdeKgF+NpKt/IIjDfxmWczY9O8N t1NcoEs0VEAjhu9+SlHJ6ma7WUjYVL6ctQTcZBAg/EMWYkcdKPA7lmKPuOUQWt3iKomQ JuqQSO8+9CwFLEw4FofSO3fQQiMb9hAdNa/tv7r8QDP9ZEQ4FpSrGhejQlmFDKBlZQeY FNoxUxT9TR+AlXlxNsJGSuBggDRN52rgw210xu4qIYRqDRZnFldsc1nrLsy6fUdjxjg9 EeaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=L4JxaB/ZQ4GTAlGrgHDF4xJYCRU6c3BJAy3OSKIVAZg=; b=gyV+bTRhn3vCMkmxx1SFnx/IsJ/gJTOs+xxoRrCeYi+PFc7B4qhj5qZoUgQoRWhZn6 VG1Ol0yuabz/8wLCRCVneiNFvzK01sVPTT669YeGnw7YLQOiyfeMfaX2LpMdOFOdZP2F gw099Sf84FrsvJFw2qjUZrhrujOjkBTMWMqeTV5AMemfLqA90hwpLn3QlcZoApfP8QKE Fg/VZf6lF5aysk+9611VZzdDKlCM0Ad4/2Gjhg1jcpeVP5Telhy3Fi7vI+svDy93TNlC 3M2UR2NtoysE7ErFDv3K6i5HzztHIzFu3At8GEri/H/lG3e1V7vEOcB+wcNuMQJWM7w/ smlQ== X-Gm-Message-State: AKS2vOwqFFnI8Rr6f8H9ANmFvje4axFtFj9Oe5rj3kcvExJwhuQpbEkQ jv+D26Hpu8kg6o4wa3LargKofWqLDA== X-Received: by 10.223.160.40 with SMTP id k37mr13376113wrk.91.1498494380248; Mon, 26 Jun 2017 09:26:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.160.42 with HTTP; Mon, 26 Jun 2017 09:26:19 -0700 (PDT) In-Reply-To: <61f98b7d-f55d-aa0f-4aef-1bdfbc7086ff@yandex.ru> References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> <61f98b7d-f55d-aa0f-4aef-1bdfbc7086ff@yandex.ru> From: Matt Joras Date: Mon, 26 Jun 2017 09:26:19 -0700 Message-ID: Subject: Re: mbuf_jumbo_9k & iSCSI failing To: "Andrey V. Elsukov" Cc: Ben RUBSON , FreeBSD Net , Ryan Stone , "freebsd-scsi@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 16:26:22 -0000 On Mon, Jun 26, 2017 at 6:36 AM, Andrey V. Elsukov wrote: > On 26.06.2017 16:29, Ben RUBSON wrote: >> >>> On 26 Jun 2017, at 15:25, Andrey V. Elsukov wrote: >>> >>> On 26.06.2017 16:27, Ben RUBSON wrote: >>>> >>>>> On 26 Jun 2017, at 15:13, Andrey V. Elsukov wrote: >>>>> >>>>> I think it is not mlxen specific problem, we have the same symptoms with >>>>> ixgbe(4) driver too. To avoid the problem we have patches that are >>>>> disable using of 9k mbufs, and instead only use 4k mbufs. >>>> >>>> Interesting feedback Andrey, thank you ! >>>> The problem may be then "general". >>>> So you still use large MTU (>=9000) but only allocating 4k mbufs, as a workaround ? >>> >>> Yes. >> >> Is it a kernel patch or a driver/ixgbe patch ? > > I attached it. > > -- > WBR, Andrey V. Elsukov I didn't think that ixgbe(4) still suffered from this problem, and we use it in the same situations rstone mentioned above. Indeed, ixgbe(4) doesn't presently suffer from this problem (you can see that in your patch, as it is only effectively changing the other drivers), though it used to. It looks like it was first fixed to not to in r280182. From owner-freebsd-scsi@freebsd.org Mon Jun 26 16:33:04 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52531D8B9C3; Mon, 26 Jun 2017 16:33:04 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from forward3j.cmail.yandex.net (forward3j.cmail.yandex.net [IPv6:2a02:6b8:0:1630::16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "forwards.mail.yandex.net", Issuer "Yandex CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DB2E67F964; Mon, 26 Jun 2017 16:33:03 +0000 (UTC) (envelope-from bu7cher@yandex.ru) Received: from smtp1j.mail.yandex.net (smtp1j.mail.yandex.net [95.108.130.59]) by forward3j.cmail.yandex.net (Yandex) with ESMTP id 19E3D20DAE; Mon, 26 Jun 2017 19:33:00 +0300 (MSK) Received: from smtp1j.mail.yandex.net (localhost.localdomain [127.0.0.1]) by smtp1j.mail.yandex.net (Yandex) with ESMTP id 47C8D3C80F56; Mon, 26 Jun 2017 19:32:56 +0300 (MSK) Received: by smtp1j.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id AJ1ygWDziH-WuoiBls4; Mon, 26 Jun 2017 19:32:56 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1498494776; bh=+Nkw5KkQClkY3jfyBfJ8TMZbKDMZn/Ap1JQPuiUgof0=; h=Subject:To:Cc:References:From:Message-ID:Date:In-Reply-To; b=OOlT2o9BXjq0Bv4mtClYdLxHN0hLJ/LIPB1B3P4JQKGQ0M+8Vk/FRpvr+OEZo7RjV HfTF8wCe2bi0MjLU6M6C6UzQQ1Er96sxe7B1sJeKpRrxm8A6CU2a2FueRqpmcuB5KP Gogqsr7z04ah5SMW5cn6lR8PDJGyGQCYVKBcdQsI= Authentication-Results: smtp1j.mail.yandex.net; dkim=pass header.i=@yandex.ru X-Yandex-Suid-Status: 1 0,1 0,1 0,1 0,1 0 Subject: Re: mbuf_jumbo_9k & iSCSI failing To: Matt Joras Cc: Ben RUBSON , FreeBSD Net , Ryan Stone , "freebsd-scsi@freebsd.org" References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> <61f98b7d-f55d-aa0f-4aef-1bdfbc7086ff@yandex.ru> From: "Andrey V. Elsukov" Openpgp: id=E6591E1B41DA1516F0C9BC0001C5EA0410C8A17A Message-ID: Date: Mon, 26 Jun 2017 19:30:22 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.0.1 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="LSl5TAswqMcmo5tKJaEuQNEMULCLPXSuI" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2017 16:33:04 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --LSl5TAswqMcmo5tKJaEuQNEMULCLPXSuI Content-Type: multipart/mixed; boundary="W5Mi1AjedTkUXLfJKKc2QlKE0GnXDnkwL"; protected-headers="v1" From: "Andrey V. Elsukov" To: Matt Joras Cc: Ben RUBSON , FreeBSD Net , Ryan Stone , "freebsd-scsi@freebsd.org" Message-ID: Subject: Re: mbuf_jumbo_9k & iSCSI failing References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <64abec26-e310-d66d-93ae-3536914ddd84@yandex.ru> <86D76532-92F4-479C-A714-126D007AD91F@gmail.com> <61f98b7d-f55d-aa0f-4aef-1bdfbc7086ff@yandex.ru> In-Reply-To: --W5Mi1AjedTkUXLfJKKc2QlKE0GnXDnkwL Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 26.06.2017 19:26, Matt Joras wrote: > I didn't think that ixgbe(4) still suffered from this problem, and we > use it in the same situations rstone mentioned above. Indeed, ixgbe(4) > doesn't presently suffer from this problem (you can see that in your > patch, as it is only effectively changing the other drivers), though > it used to. It looks like it was first fixed to not to in r280182. >=20 Yes, actually we have this patch since 8.x. Recent drivers aren't affected by this problem. iflib also has the code: #ifndef CONTIGMALLOC_WORKS else fl->ifl_buf_size =3D MJUMPAGESIZE; #else else if (sctx->isc_max_frame_size <=3D 4096) fl->ifl_buf_size =3D MJUMPAGESIZE; else if (sctx->isc_max_frame_size <=3D 9216) fl->ifl_buf_size =3D MJUM9BYTES; else fl->ifl_buf_size =3D MJUM16BYTES; #endif that seems by default doesn't use 9-16k mbufs. --=20 WBR, Andrey V. Elsukov --W5Mi1AjedTkUXLfJKKc2QlKE0GnXDnkwL-- --LSl5TAswqMcmo5tKJaEuQNEMULCLPXSuI Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEzBAEBCAAdFiEE5lkeG0HaFRbwybwAAcXqBBDIoXoFAllRNp4ACgkQAcXqBBDI oXqBbwgAq7tK/6KFBl04+UzquamnCs4v85dAx65EG8gHFVAXkOSYW9rXBSieX2wU 9JPZNQmDF9eO6xv4oFHQg87bwIs6WEWKc3TO1iR+7mDycRDi/7dEEzmyi1Px4HFx 8gAnaF6VqTjixRfPRuXQ8eZXR6mKFGSVdiHwFrqZ6M6DTEZiqCxjAa7ZfF6mFSwH cs44QmzYCGP+bI6PIwF4ylI7gVgD7yWg/3zWxO0J5i3T+65+ZKAd4gznb09HxzHB R7mmoYWOsm/V9g07MlLhHkRzD9+Ozhm/dJk8F1WgP6gXvxh7etJHGuY9W7xl5Ic9 9MpgQB9xUoKrtqWPSHAX+pbx3tP94w== =Al7R -----END PGP SIGNATURE----- --LSl5TAswqMcmo5tKJaEuQNEMULCLPXSuI-- From owner-freebsd-scsi@freebsd.org Tue Jun 27 01:15:13 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8D6ADD9667B for ; Tue, 27 Jun 2017 01:15:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 79E7D7201C for ; Tue, 27 Jun 2017 01:15:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v5R1FC6D090764 for ; Tue, 27 Jun 2017 01:15:13 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 220175] [iscsi] ctld often gets stuck in a "D" state Date: Tue, 27 Jun 2017 01:15:13 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: jpaetzel@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: jpaetzel@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to cc bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jun 2017 01:15:13 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D220175 Josh Paetzel changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-scsi@FreeBSD.org |jpaetzel@FreeBSD.org CC| |jpaetzel@FreeBSD.org Status|New |Open --- Comment #1 from Josh Paetzel --- I'd be very suspicious of a problem with the disk subsystem in this system. Can you describe the disk subsystem and filesystem please? --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-scsi@freebsd.org Tue Jun 27 05:13:22 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0FA43DA0F91; Tue, 27 Jun 2017 05:13:22 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-wm0-x229.google.com (mail-wm0-x229.google.com [IPv6:2a00:1450:400c:c09::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 911E87BDAB; Tue, 27 Jun 2017 05:13:21 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: by mail-wm0-x229.google.com with SMTP id b184so15089553wme.1; Mon, 26 Jun 2017 22:13:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=mXHIc7XZRwI1tVFy4U5Y05uxFRGexqrwlKrlqodjHzQ=; b=gt06FYWtXnbbSbtYM0mL22+dmjpcOIyjXhwYTMigK7VZyiZw95g0tsF6c3jj2E4otS eBXiXymwMSSF1NfP7zNoGNmCUUKwqBFXbf05MBlVK278wQfa2qpoA1Y95HBTNeqXs27b yGMnoi7c3jtirbCQ7Yqv14OHMPwcaZSuamfLnlX7qwXqIMdNYhwTAjqY209Ud40T7atr oGEIJEeKzy/HzYOyJuE0qhqeBN/gxxffUAInsPq73cdSkjuY9ztTPm29d2FK+VSlkCHf 3/ASf7yuP6sygy5pqVr6S4Vjds/LQi6kSXYmmBTJOKaVywwmFWK8lNkXpz8NX/Er5ruv HYpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=mXHIc7XZRwI1tVFy4U5Y05uxFRGexqrwlKrlqodjHzQ=; b=grHq/+mrV2SgKsu7J9xQy9WLaWGyTONi+ZMRSJOe8cCGzH9FFuIDHrxH4W2IksaHK6 hTi3XUFv4m4ts0BZUlKSshK2UaEjijvbxr8F+hGaifC9k10h6qyomFaMyzJ02S1zVDqs op9VK6ZMSjsQP4Iehg5LaHsVCBhRUJCrDEV0pEgORbG70R4GZTuEgp+pSSnFuf+wB9bW P2IzsbZcD+9QDxWlzyKOvxvG+mQWBe2dBrlFS8cOkrDDyCaPwCPBHIXrsQlpgnwZIPVC PBmF30gCT3nUMAoXNUHX19a1H9lLAFSreDMgpV/y5U7D1Fsh0dPiiMphjZxi5mgkiM3R JLjw== X-Gm-Message-State: AKS2vOxIFQ1anzdM7Bxn/6U7/R4PZc+VQLe5qnDNFHHDpRGvt/YgCnS7 rR1M2Q/t+1PWplq/l7ADtwbJiVK42A== X-Received: by 10.80.161.69 with SMTP id 63mr2469430edj.142.1498540399677; Mon, 26 Jun 2017 22:13:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.183.176 with HTTP; Mon, 26 Jun 2017 22:13:18 -0700 (PDT) In-Reply-To: <14CB3F50-0426-48BD-838C-943B6D15FEB9@gmail.com> References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> <14CB3F50-0426-48BD-838C-943B6D15FEB9@gmail.com> From: Zaphod Beeblebrox Date: Tue, 27 Jun 2017 01:13:18 -0400 Message-ID: Subject: Re: mbuf_jumbo_9k & iSCSI failing To: Ben RUBSON Cc: Ryan Stone , FreeBSD Net , "freebsd-scsi@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jun 2017 05:13:22 -0000 Don't forget that, generally, as I understand it, the network stack suffers from the same problem for 9k buffers. On Sun, Jun 25, 2017 at 12:56 PM, Ben RUBSON wrote: > > On 25 Jun 2017, at 17:32, Ryan Stone wrote: > > > > Having looking at the original email more closely, I see that you showed > an mlxen interface with a 9020 MTU. Seeing allocation failures of 9k mbuf > clusters increase while you are far below the zone's limit means that > you're definitely running into the bug I'm describing, and this bug could > plausibly cause the iSCSI errors that you describe. > > > > The issue is that the newer version of the driver tries to allocate a > single buffer to accommodate an MTU-sized packet. Over time, however, > memory will become fragmented and eventually it can become impossible to > allocate a 9k physically contiguous buffer. When this happens the driver > is unable to allocate buffers to receive packets and is forced to drop > them. Presumably, if iSCSI suffers too many packet drops it will terminate > the connection. The older version of the driver limited itself to > page-sized buffers, so it was immune to issues with memory fragmentation. > > Thank you for your explanation Ryan. > You say "over time", and you're right, I have to wait several days (here > 88) before the problem occurs. > Strange however that in 2500MB free memory system is unable to find 9k > physically contiguous. But we never know :) > > Let's then wait for your patch ! > (and reboot for now) > > Many thx ! > > Ben > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-scsi@freebsd.org Tue Jun 27 12:57:26 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0880BDAA035 for ; Tue, 27 Jun 2017 12:57:26 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EB46865333 for ; Tue, 27 Jun 2017 12:57:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v5RCvPnb017248 for ; Tue, 27 Jun 2017 12:57:25 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 220094] [scsi] sys/cam/scsi/scsi_sa.c: a sleep-under-mutex bug in saioctl Date: Tue, 27 Jun 2017 12:57:25 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: commit-hook@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: ken@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jun 2017 12:57:26 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D220094 --- Comment #4 from commit-hook@freebsd.org --- A commit references this bug: Author: ken Date: Tue Jun 27 12:56:37 UTC 2017 New revision: 320405 URL: https://svnweb.freebsd.org/changeset/base/320405 Log: MFC r320123: Fix a potential sleep while holding a mutex in the sa(4) driver. If the user issues a MTIOCEXTGET ioctl, and the tape drive in question = has a serial number that is longer than 80 characters, we malloc a buffer in saextget() to hold the output of cam_strvis(). Since a mutex is held in that codepath, doing a M_WAITOK malloc could l= ead to sleeping while holding a mutex. Change it to a M_NOWAIT malloc and = bail out if we fail to allocate the memory. Devices with serial numbers lon= ger than 80 bytes are very rare (I don't recall seeing one), so this should be a very unusual case to hit. But it is a bug that should be fixed. sys/cam/scsi/scsi_sa.c: In saextget(), if we need to malloc a buffer to hold the output of cam_strvis(), don't wait for the memory. Fail and return an error if we can't allocate the memory immediately. PR: kern/220094 Submitted by: Jia-Ju Bai Sponsored by: Spectra Logic Approved by: re (gjb) Changes: _U stable/11/ stable/11/sys/cam/scsi/scsi_sa.c --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-scsi@freebsd.org Tue Jun 27 14:12:59 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A6C46D8667B for ; Tue, 27 Jun 2017 14:12:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9500867E28 for ; Tue, 27 Jun 2017 14:12:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v5RECwvV046854 for ; Tue, 27 Jun 2017 14:12:59 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 220094] [scsi] sys/cam/scsi/scsi_sa.c: a sleep-under-mutex bug in saioctl Date: Tue, 27 Jun 2017 14:12:59 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: ken@FreeBSD.org X-Bugzilla-Status: Closed X-Bugzilla-Resolution: FIXED X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: ken@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: resolution bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jun 2017 14:12:59 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D220094 Kenneth D. Merry changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|New |Closed --- Comment #5 from Kenneth D. Merry --- Fixed in head, stable/11 and stable/10. Thank you for the bug report! --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-scsi@freebsd.org Tue Jun 27 19:26:48 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DE9C6D8E7E8 for ; Tue, 27 Jun 2017 19:26:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B4D527A0DF for ; Tue, 27 Jun 2017 19:26:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v5RJQmPD036644 for ; Tue, 27 Jun 2017 19:26:48 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 219701] crash in camperiphfree() Date: Tue, 27 Jun 2017 19:26:48 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: commit-hook@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: ken@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jun 2017 19:26:49 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219701 --- Comment #4 from commit-hook@freebsd.org --- A commit references this bug: Author: ken Date: Tue Jun 27 19:26:03 UTC 2017 New revision: 320421 URL: https://svnweb.freebsd.org/changeset/base/320421 Log: Fix a panic in camperiphfree(). If a peripheral driver (e.g. da, sa, cd) is added or removed from the peripheral driver list while an unrelated peripheral driver instance (e.g. da0, sa5, cd2) is going away and is inside camperiphfree(), we could dereference an invalid pointer. When peripheral drivers are added or removed (see periphdriver_register() and periphdriver_unregister()), the peripheral driver array is resized and existing entries are moved. Although we hold the topology lock while we traverse the peripheral driver list, we retain a pointer to the location of the peripheral driver pointer and then drop the topology lock. So we are still vulnerable to the list getting moved around while the lock is dropped. To solve the problem, cache a copy of the peripheral driver pointer. If its storage location in the list changes while we have the lock dropped, = it won't have any effect. This doesn't solve the issue that peripheral drivers ("da", "cd", as oppo= sed to individual instances like "da0", "cd0") are not generally part of a reference counting scheme to guard against deregistering them while there are instances active. The caller (generally the person unloading a modul= e) has to be aware of active drivers and not unload something that is in use. sys/cam/cam_periph.c: In camperiphfree(), cache a pointer to the peripheral driver instance to avoid holding a pointer to an invalid memory location in the event that the peripheral driver list changes while we have the topology lock dropped. PR: kern/219701 Submitted by: avg MFC after: 3 days Sponsored by: Spectra Logic Changes: head/sys/cam/cam_periph.c --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-scsi@freebsd.org Thu Jun 29 11:49:05 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5E4B3D9B030; Thu, 29 Jun 2017 11:49:05 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x236.google.com (mail-wm0-x236.google.com [IPv6:2a00:1450:400c:c09::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DF8178152D; Thu, 29 Jun 2017 11:49:04 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x236.google.com with SMTP id w126so79655522wme.0; Thu, 29 Jun 2017 04:49:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:content-transfer-encoding:subject:message-id:date:to :mime-version; bh=/0PeDsqUvWY8Rmc0HQnOBmBLDJcNSSBrJMtjHVZ6l44=; b=MqL772KK6OUCZ5UIA3Zd8ub2P189RwetjkCP4Eq+B6NwMecz8Pu521g6UqLfItQad/ N5TqNeBmcSGzOrhe3qSDZKGJv3x3zGO1WLWdSSLRbZN07bIdG0FgU3Gf0OuHt0Tffj+h QNc2UImcMRNrT6BdgZY2Fdrsr/yazQh5/yuDzTEHug1QvEZ1QLzmL8j3LF5YTm7E7JEh FHAl6FM+WI+0n/ibEioAb/w+nrVLPbQG2IjLJL70yIIbq/J+Z+uW/1Kfc5ZNCShVPqvg pUPueR0lBehmB6V/99APUoQ/uIqVXGMhfYV5LnfU2Mrn1vISutM3JNn1J7ULaCc+q4e4 jvXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:subject :message-id:date:to:mime-version; bh=/0PeDsqUvWY8Rmc0HQnOBmBLDJcNSSBrJMtjHVZ6l44=; b=E0Ex8duggTxTwIFvuj8Bm2xIoq2fPS2Jq8MyrPuVOK2hLN+DRfOUzWuFgS6xgOlLZd dafCyvU2Z/5AIjy8FJdJwrIWq+sawdGWEPxeQH6gHiWD/3dIAY1UPBH8McHSl2fkWwgk pKH5/vmC/v6+YhwFUam3L/ZXjQDi3PMFCI4Yi3qQO2aM0F7/5lJQaDOW9rd4OgCwpUtV U/tN7XjPpV+5ALPk9jaL66kYGQJvnBwdINCDvVYeuqhBAxjqac57Q4S6ONg4gmhEctRf ydm7uWUR5eb7epJvQeT9DzoMQ2NyrmCv1vxjW4yQglJ4IgZfpngsEutXmLhhyF9gOGv8 Nubw== X-Gm-Message-State: AIVw113MC5e1+WQMCq0C6Yf4xRLduPmLDZ5g66oIU/0D5A8CLChNYQSY d/YvwCedvHurm4rZ+y4= X-Received: by 10.28.211.10 with SMTP id k10mr1566474wmg.117.1498736942396; Thu, 29 Jun 2017 04:49:02 -0700 (PDT) Received: from ben.home (LFbn-1-11339-180.w2-15.abo.wanadoo.fr. [2.15.165.180]) by smtp.gmail.com with ESMTPSA id r142sm1148896wmg.24.2017.06.29.04.49.00 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 29 Jun 2017 04:49:01 -0700 (PDT) From: Ben RUBSON Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: I/O to pool appears to be hung, panic ! Message-Id: Date: Thu, 29 Jun 2017 13:48:59 +0200 To: Freebsd fs , freebsd-scsi@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2017 11:49:05 -0000 Hello, One of my servers did a kernel panic last night, giving the following = message : panic: I/O to pool 'home' appears to be hung on vdev guid 122... at = '/dev/label/G23iscsi'. Pool is made like this : home mirror label/G13local label/G14local label/G23iscsi <-- busy disk label/G24iscsi mirror label/G15local label/G16local label/G25iscsi label/G26iscsi cache label/G10local label/G11local Kernel is then complaining about one of the 4 iSCSI disks in the pool. All these 4 disks come from another identical FreeBSD system (40G "no = latency" link). Here are some numbers regarding this disk, taken from the server hosting = the pool : (unfortunately not from the iscsi target server) https://s23.postimg.org/zd8jy9xaj/busydisk.png We clearly see that suddendly, disk became 100% busy, meanwhile CPU was = almost idle. No error message at all on both servers. SMART from the target disk is nice : SMART Health Status: OK Current Drive Temperature: 32 C Drive Trip Temperature: 85 C Manufactured in week 22 of year 2016 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 18 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 2362 Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator =3D 5938879802638336 Error counter log: Errors Corrected by Total Correction = Gigabytes Total ECC rereads/ errors algorithm = processed uncorrected fast | delayed rewrites corrected invocations [10^9 = bytes] errors read: 0 14 0 14 488481 74496.712 = 0 write: 0 0 0 0 126701 18438.443 = 0 verify: 0 0 0 0 20107 0.370 = 0 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime = LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Completed - 7943 = - [- - -] # 2 Background long Completed - 7607 = - [- - -] # 3 Background long Completed - 7271 = - [- - -] The only log I have is the following stacktrace taken from the server = console : panic: I/O to pool 'home' appears to be hung on vdev guid 122... at = '/dev/label/G23iscsi'. cpuid =3D 0 KDB: stack backtrace: #0 0xffffffff80b240f7 at kdb_backtrace+0x67 #1 0xffffffff80ad9462 at vpanic+0x182 #2 0xffffffff80ad92d3 at panic+0x43 #3 0xffffffff82238fa7 at vdev_deadman+0x127 #4 0xffffffff82238ec0 at vdev_deadman+0x40 #5 0xffffffff82238ec0 at vdev_deadman+0x40 #6 0xffffffff8222d0a6 at spa_deadman+0x86 #7 0xffffffff80af32da at softclock_call_cc+0x18a #8 0xffffffff80af3854 at softclock+0x94 #9 0xffffffff80a9348f at intr_event_execute_handlers+0x20f #10 0xffffffff80a936f6 at ithread_loop+0xc6 #11 0xffffffff80a900d5 at fork_exit+0x85 #12 0xffffffff80f846fe at fork_trampoline+0xe Uptime: 92d2h47m6s I would have been pleased to make a dump available. However, despite my (correct ?) configuration, server did not dump : (nevertheless, "sysctl debug.kdb.panic=3D1" make it to dump) # grep ^dump /boot/loader.conf /etc/rc.conf /boot/loader.conf:dumpdev=3D"/dev/mirror/swap" /etc/rc.conf:dumpdev=3D"AUTO" # gmirror list swap Components: 2 Balance: prefer Providers: 1. Name: mirror/swap Mediasize: 8589934080 (8.0G) Consumers: 1. Name: label/swap1 State: ACTIVE Priority: 0 2. Name: label/swap2 State: ACTIVE Priority: 1 I use default kernel, with a rebuilt zfs module : # uname -v FreeBSD 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017 = root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC=20 I use the following iSCSI configuration, which disconnects the disks "as = soon as" they are unavailable : kern.iscsi.ping_timeout=3D5 kern.iscsi.fail_on_disconnection=3D1 kern.iscsi.iscsid_timeout=3D5 I then think disk was at least correctly reachable during these 20 busy = minutes. So, any idea why I could have faced this issue ? I would have thought ZFS would have taken the busy device offline, = instead of raising a panic. Perhaps it is already possible to make ZFS behave like this ? Thank you very much for your help & support ! Best regards, Ben From owner-freebsd-scsi@freebsd.org Thu Jun 29 13:36:50 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3896DD9DA79; Thu, 29 Jun 2017 13:36:50 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wr0-x22c.google.com (mail-wr0-x22c.google.com [IPv6:2a00:1450:400c:c0c::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BF9A784B72; Thu, 29 Jun 2017 13:36:49 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wr0-x22c.google.com with SMTP id 77so189723290wrb.1; Thu, 29 Jun 2017 06:36:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=+McbihPIfeGbcGMjhpqCdlCVmu56lMnabX+qRvak6gA=; b=pyYZmvTpY7dmtUS9n4ncGm6giAdNt48yKoIuLdiMbqS6reZvnpNjrEmBgXdzMp46eG tPawmK6TDvaeZ8887eHzCP6AgWUAj5E1Rvi1uIhgYlYHYRGRw1a8SU3mi7JsR/JSnMn8 Gqs0I8hwFmIsVsl4bpz5AmSfz8f1Hh+uuLnTstu3yYqCJIzMvMVTSqd17h80/2GfNQgR LK9clCpjH9WnBERnPbF9FbcK8a9+RC29Bkb1T33pu+MysWu6WJkicg8Alh0hVymtuQJO dkIZXO3e/3HqSuAUBXNlwMYqenGboShzSVl+pd/US4yp8EgvTXYK/LUlO5V2EeVRpO3S Ej5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=+McbihPIfeGbcGMjhpqCdlCVmu56lMnabX+qRvak6gA=; b=p5cxOjMt8eu7VVlvPrZg0D68pU6/4D9bgT0bmNeRI9EtJP/jbM0kT5OvmDXi1t9jBg 4kRxtRy/mZNAQRMN/K5AZnxOJHHJ8kT/3Xz0niDoHPq7pvRsibYXBEWehGmXBjpVoLBl OgvP3lgn4rIiSfbyzP6MQ7PcuwC5r8goPoMvmvQXX3Jh/Z54hgiwsykhXKRdNFbUlA/M JePcyyAO2suGmOknG9KggmNf101+oIpT66q71tcWi8SDgBtGQsAaNRRyADj/bNhtczRn iKLQnx85+ZtQBtOk+CQO8NOdYnTdDwnLE1i80qip//uGZ48nlLB9svSN4xjlUYiOxDp7 V1jQ== X-Gm-Message-State: AKS2vOwYt7y/W6mDHhXHsk8cDaAfAwiYbeGjqzwR5W2yxR2wPlpaRRLx 4nl9NJgd/q7+ynWQe/8= X-Received: by 10.223.143.10 with SMTP id p10mr21583804wrb.120.1498743407648; Thu, 29 Jun 2017 06:36:47 -0700 (PDT) Received: from ben.home (LFbn-1-11339-180.w2-15.abo.wanadoo.fr. [2.15.165.180]) by smtp.gmail.com with ESMTPSA id b197sm1498890wmb.4.2017.06.29.06.36.46 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 29 Jun 2017 06:36:47 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: I/O to pool appears to be hung, panic ! From: Ben RUBSON In-Reply-To: <20170629144334.1e283570@fabiankeil.de> Date: Thu, 29 Jun 2017 15:36:45 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20170629144334.1e283570@fabiankeil.de> To: Freebsd fs , freebsd-scsi@freebsd.org X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2017 13:36:50 -0000 > On 29 Jun 2017, at 14:43, Fabian Keil = wrote: Thank you for your feedback Fabian. > Ben RUBSON wrote: >=20 >> One of my servers did a kernel panic last night, giving the following = message : >> panic: I/O to pool 'home' appears to be hung on vdev guid 122... at = '/dev/label/G23iscsi'. > [...]=20 >> Here are some numbers regarding this disk, taken from the server = hosting the pool : >> (unfortunately not from the iscsi target server) >> https://s23.postimg.org/zd8jy9xaj/busydisk.png >>=20 >> We clearly see that suddendly, disk became 100% busy, meanwhile CPU = was almost idle. >>=20 >> No error message at all on both servers. > [...] >> The only log I have is the following stacktrace taken from the server = console : >> panic: I/O to pool 'home' appears to be hung on vdev guid 122... at = '/dev/label/G23iscsi'. >> cpuid =3D 0 >> KDB: stack backtrace: >> #0 0xffffffff80b240f7 at kdb_backtrace+0x67 >> #1 0xffffffff80ad9462 at vpanic+0x182 >> #2 0xffffffff80ad92d3 at panic+0x43 >> #3 0xffffffff82238fa7 at vdev_deadman+0x127 >> #4 0xffffffff82238ec0 at vdev_deadman+0x40 >> #5 0xffffffff82238ec0 at vdev_deadman+0x40 >> #6 0xffffffff8222d0a6 at spa_deadman+0x86 >> #7 0xffffffff80af32da at softclock_call_cc+0x18a >> #8 0xffffffff80af3854 at softclock+0x94 >> #9 0xffffffff80a9348f at intr_event_execute_handlers+0x20f >> #10 0xffffffff80a936f6 at ithread_loop+0xc6 >> #11 0xffffffff80a900d5 at fork_exit+0x85 >> #12 0xffffffff80f846fe at fork_trampoline+0xe >> Uptime: 92d2h47m6s >>=20 >> I would have been pleased to make a dump available. >> However, despite my (correct ?) configuration, server did not dump : >> (nevertheless, "sysctl debug.kdb.panic=3D1" make it to dump) >> # grep ^dump /boot/loader.conf /etc/rc.conf >> /boot/loader.conf:dumpdev=3D"/dev/mirror/swap" >> /etc/rc.conf:dumpdev=3D"AUTO" >=20 > You may want to look at the NOTES section in gmirror(8). Yes, I should already be OK (prefer algorithm set). >> I use default kernel, with a rebuilt zfs module : >> # uname -v >> FreeBSD 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017 = root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC=20 >>=20 >> I use the following iSCSI configuration, which disconnects the disks = "as soon as" they are unavailable : >> kern.iscsi.ping_timeout=3D5 >> kern.iscsi.fail_on_disconnection=3D1 >> kern.iscsi.iscsid_timeout=3D5 >>=20 >> I then think disk was at least correctly reachable during these 20 = busy minutes. >>=20 >> So, any idea why I could have faced this issue ? >=20 > Is it possible that the system was under memory pressure? No I don't think it was : https://s1.postimg.org/uvsebpyyn/busydisk2.png More than 2GB of available memory. Swap not used (624kB). ARC behaviour seems correct (anon increases because ZFS can't actually = write I think). Regarding the pool itself, it was receiving data at 6MB/s, sending = around 30kB blocks to disks. When disk went busy, throughput fell to some kB, with 128kB blocks. > geli's use of malloc() is known to cause deadlocks under memory = pressure: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209759 >=20 > Given that gmirror uses malloc() as well it probably has the same = issue. I don't use geli so I should not face this issue. >> I would have thought ZFS would have taken the busy device offline, = instead of raising a panic. >> Perhaps it is already possible to make ZFS behave like this ? >=20 > There's a tunable for this: vfs.zfs.deadman_enabled. > If the panic is just a symptom of the deadlock it's unlikely > to help though. I think this tunable should have prevented the server from having raised = a panic : # sysctl -d vfs.zfs.deadman_enabled vfs.zfs.deadman_enabled: Kernel panic on stalled ZFS I/O # sysctl vfs.zfs.deadman_enabled vfs.zfs.deadman_enabled: 1 But not sure how it would have behaved then... (busy disk miraculously back to normal status, memory pressure due to = anon increasing...) I also tried to look for some LSI SAS2008 error counters (on target = side), but did not found anything interesting. (sysctl -a | grep -i mps) Thank you again, Ben From owner-freebsd-scsi@freebsd.org Thu Jun 29 18:49:19 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E7A05DA2FFC; Thu, 29 Jun 2017 18:49:19 +0000 (UTC) (envelope-from karli@inparadise.se) Received: from mail.inparadise.se (h-246-50.A444.priv.bahnhof.se [155.4.246.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8732A6EA23; Thu, 29 Jun 2017 18:49:18 +0000 (UTC) (envelope-from karli@inparadise.se) Received: from localhost (localhost [127.0.0.1]) by mail.inparadise.se (Postfix) with ESMTP id 822DE489C4; Thu, 29 Jun 2017 20:40:44 +0200 (CEST) Received: from mail.inparadise.se ([127.0.0.1]) by localhost (mail.inparadise.se [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 9d2JQulMdY3v; Thu, 29 Jun 2017 20:40:43 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by mail.inparadise.se (Postfix) with ESMTP id AAA0C489C5; Thu, 29 Jun 2017 20:40:43 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.inparadise.se AAA0C489C5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inparadise.se; s=ECF0F226-2F14-11E7-BBE9-ECFEB9BC1D67; t=1498761643; bh=9yazR+QulQ43ITdDjWG9Ngjsw2oS+5gYOLYKwpKRD1g=; h=Date:Message-ID:To:From:MIME-Version; b=1pEzq2qWIrvaRlN9pbTbKiaS6UfArx6QD2/uaFZzp4Eyc+Vi3pf4OGCFrBiyS592H AZ2y+ECav8z6Q6fPnLZTUdWU51gPiGguteaWpeKPvICmoSbTuE5YR6uKWQ/R/dp54R hg8l6x1MwlRRGeOdzJDNwFJPsFYPyLRjDTC3ivrO+4RQczNMDJwL/+UPR+EAPmO6vv 6CGNTyZopUNChvaTCzJMU3W+R7ky38Yl6vfdS1zprmqUC0MRkintO6BQS32LIOkeXu bM7LlQj0yTAKfnJkh8hmp83LkEqDL+yMLGs4zCnY+ipNqE0l+HF94v9UCbJcnW+OJu hCs8TeH8TPGSQ== X-Virus-Scanned: amavisd-new at inparadise.se Received: from mail.inparadise.se ([127.0.0.1]) by localhost (mail.inparadise.se [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id L-VYpFlZt8xy; Thu, 29 Jun 2017 20:40:43 +0200 (CEST) Received: from mail.inparadise.se (localhost [127.0.0.1]) by mail.inparadise.se (Postfix) with ESMTP id 88D33489C4; Thu, 29 Jun 2017 20:40:43 +0200 (CEST) Date: Thu, 29 Jun 2017 20:40:43 +0200 (CEST) Subject: Re: I/O to pool appears to be hung, panic ! Message-ID: X-Android-Message-ID: To: Ben RUBSON Cc: freebsd-scsi@freebsd.org, Freebsd fs Importance: Normal X-Priority: 3 X-MSMail-Priority: Normal From: =?utf-8?B?S2FybGkgU2rDtmJlcmc=?= X-Originating-IP: [172.16.1.154, 127.0.0.1] X-Mailer: Zimbra 8.7.1_GA_1670 (Android-Mail/7.6.4.158567011.release(...883836) devip=172.16.1.154 ZPZB/66) Thread-Index: elz8IC2KjbrK5W6ahkbM30aqsX5MTA== Thread-Topic: I/O to pool appears to be hung, panic ! X-Mailman-Approved-At: Thu, 29 Jun 2017 19:15:16 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2017 18:49:20 -0000 From owner-freebsd-scsi@freebsd.org Thu Jun 29 23:15:21 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6B29CDA79A8 for ; Thu, 29 Jun 2017 23:15:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5932E77513 for ; Thu, 29 Jun 2017 23:15:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v5TNFKZk010628 for ; Thu, 29 Jun 2017 23:15:21 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 220371] [patch] camdd: Add support for other protocols Date: Thu, 29 Jun 2017 23:15:21 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: cem@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-scsi@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2017 23:15:21 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D220371 Conrad Meyer changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |freebsd-scsi@FreeBSD.org CC| |cem@freebsd.org --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-scsi@freebsd.org Fri Jun 30 19:17:56 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1E494D98B93 for ; Fri, 30 Jun 2017 19:17:56 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 059467994B for ; Fri, 30 Jun 2017 19:17:56 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: by mailman.ysv.freebsd.org (Postfix) id 04FA6D98B91; Fri, 30 Jun 2017 19:17:56 +0000 (UTC) Delivered-To: scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 049C6D98B90 for ; Fri, 30 Jun 2017 19:17:56 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AA87379948; Fri, 30 Jun 2017 19:17:55 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (mh0.gentlemail.de [IPv6:2a00:e10:2800::a135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v5UJHq88050537; Fri, 30 Jun 2017 21:17:52 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (s1.omnilan.de [217.91.127.234]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id C736F45B; Fri, 30 Jun 2017 21:17:51 +0200 (CEST) Message-ID: <5956A3DF.8060109@omnilan.de> Date: Fri, 30 Jun 2017 21:17:51 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Alexander Motin , scsi@freebsd.org Subject: bhyve ahcich0: Timeout on slot 0 port 0, , regression with stable/11->releng/11.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]); Fri, 30 Jun 2017 21:17:52 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: ; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2017 19:17:56 -0000 Hello, on releng/11.1 I noticed a severe performace degradation during file unlinking in a FreeBSD guest. Host was running quiet recent stable/11 before. On the host, the vm is started with ahci,hd:/dev/adaN The guest attaches: ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ACS-2 ATA SATA 3.x device The geust has very high Sys-load during unlinking (50-75%@2 cores). Also, the host logs these errors: ahcich0: Timeout on slot 0 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr 00000000 cmd 0001cd17 ahcich0: ... waiting for slots fffb7ffe ahcich0: Timeout on slot 10 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr 00000000 cmd 0001cd17 ahcich0: ... waiting for slots fffb7bfe ahcich0: Timeout on slot 14 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr 00000000 cmd 0001cd17 ahcich0: ... waiting for slots fffb3bfe ahcich0: Timeout on slot 17 port 0 … ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr 00000000 cmd 0001c617 ahcich0: ... waiting for slots 00018000 ahcich0: Timeout on slot 15 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr 00000000 cmd 0001c617 ahcich0: ... waiting for slots 00010000 ahcich0: Timeout on slot 16 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr 00000000 cmd 0001c617 (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 00 e8 30 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 40 e8 30 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 80 e8 30 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command … (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 c0 ff 44 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 00 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 40 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 80 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 c0 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command … And so on. I always saw performance penalty using ahci instead of virtio-blk, most likely due to TRIM support, but never noticed such a huge difference: obj-tree deleting takes <1min with virtio-blk and usually took about 8 minutes with ahci on stable/11. Now (releng/11.1) it takes >20min (not yet finished) and I get really lots of these errors. Can someone (mav?) interpret the command errors and tell if it could be a new problem due to recent MFCs? Will bisect stable/11 revisions to see where it starts if nobody has a quick idea about the cause. Thanks, -harry From owner-freebsd-scsi@freebsd.org Fri Jun 30 19:22:11 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4FA22D98DFF for ; Fri, 30 Jun 2017 19:22:11 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 35FF479C8A for ; Fri, 30 Jun 2017 19:22:11 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: by mailman.ysv.freebsd.org (Postfix) id 351FED98DFE; Fri, 30 Jun 2017 19:22:11 +0000 (UTC) Delivered-To: scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 321D3D98DFD for ; Fri, 30 Jun 2017 19:22:11 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C6E4C79C72; Fri, 30 Jun 2017 19:22:10 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [IPv6:2a00:e10:2800::a135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v5UJM9HT050617; Fri, 30 Jun 2017 21:22:09 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (s1.omnilan.de [217.91.127.234]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id B0A4845D; Fri, 30 Jun 2017 21:22:08 +0200 (CEST) Message-ID: <5956A4E0.3030108@omnilan.de> Date: Fri, 30 Jun 2017 21:22:08 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Alexander Motin , scsi@freebsd.org Subject: Re: bhyve ahcich0: Timeout on slot 0 port 0, , regression with stable/11->releng/11.1 References: <5956A3DF.8060109@omnilan.de> In-Reply-To: <5956A3DF.8060109@omnilan.de> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]); Fri, 30 Jun 2017 21:22:09 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: ; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2017 19:22:11 -0000 Bezüglich Harry Schmalzbauer's Nachricht vom 30.06.2017 21:17 (localtime): > Hello, > > on releng/11.1 I noticed a severe performace degradation during file > unlinking in a FreeBSD guest. Host was running quiet recent stable/11 > before. > > On the host, the vm is started with > ahci,hd:/dev/adaN > > The guest attaches: > ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 > ada0: ACS-2 ATA SATA 3.x device > > The geust has very high Sys-load during unlinking (50-75%@2 cores). > Also, the host logs these errors: Sorry, not the host logs these errors, but the guest, obviously. > ahcich0: Timeout on slot 0 port 0 > ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr > 00000000 cmd 0001cd17 > ahcich0: ... waiting for slots fffb7ffe > ahcich0: Timeout on slot 10 port 0 > ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr > 00000000 cmd 0001cd17 > ahcich0: ... waiting for slots fffb7bfe > ahcich0: Timeout on slot 14 port 0 > ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr > 00000000 cmd 0001cd17 > ahcich0: ... waiting for slots fffb3bfe > ahcich0: Timeout on slot 17 port 0 > … > ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr > 00000000 cmd 0001c617 > ahcich0: ... waiting for slots 00018000 > ahcich0: Timeout on slot 15 port 0 > ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr > 00000000 cmd 0001c617 > ahcich0: ... waiting for slots 00010000 > ahcich0: Timeout on slot 16 port 0 > ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr > 00000000 cmd 0001c617 > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 00 e8 30 40 04 00 > 00 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 40 e8 30 40 04 00 > 00 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 80 e8 30 40 04 00 > 00 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > … > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 c0 ff 44 40 04 00 > 00 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 00 00 45 40 04 00 > 00 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 40 00 45 40 04 00 > 00 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 80 00 45 40 04 00 > 00 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > (ada0:ahcich0:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 > 00 00 00 40 00 00 00 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 c0 00 45 40 04 00 > 00 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > … > And so on. > > I always saw performance penalty using ahci instead of virtio-blk, most > likely due to TRIM support, but never noticed such a huge difference: > obj-tree deleting takes <1min with virtio-blk and usually took about 8 > minutes with ahci on stable/11. > Now (releng/11.1) it takes >20min (not yet finished) and I get really > lots of these errors. > > Can someone (mav?) interpret the command errors and tell if it could be > a new problem due to recent MFCs? > Will bisect stable/11 revisions to see where it starts if nobody has a > quick idea about the cause. > > Thanks, > > -harry > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Sat Jul 1 07:58:28 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BD7BBD9F341 for ; Sat, 1 Jul 2017 07:58:28 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id A41807D574 for ; Sat, 1 Jul 2017 07:58:28 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: by mailman.ysv.freebsd.org (Postfix) id A082BD9F340; Sat, 1 Jul 2017 07:58:28 +0000 (UTC) Delivered-To: scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A0190D9F33F for ; Sat, 1 Jul 2017 07:58:28 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 421847D573; Sat, 1 Jul 2017 07:58:28 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [IPv6:2a00:e10:2800::a135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v617wOdX060491; Sat, 1 Jul 2017 09:58:24 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (s1.omnilan.de [217.91.127.234]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 0B5F35A8; Sat, 1 Jul 2017 09:58:23 +0200 (CEST) Message-ID: <5957561F.7030906@omnilan.de> Date: Sat, 01 Jul 2017 09:58:23 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Alexander Motin , scsi@freebsd.org Subject: Re: not a reproducable regression [Was :Re: bhyve ahcich0: Timeout on slot 0 port 0,, regression with stable/11->releng/11.1] References: <5956A3DF.8060109@omnilan.de> <5956A4E0.3030108@omnilan.de> In-Reply-To: <5956A4E0.3030108@omnilan.de> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]); Sat, 01 Jul 2017 09:58:24 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: ; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Jul 2017 07:58:28 -0000 Bezüglich Harry Schmalzbauer's Nachricht vom 30.06.2017 21:22 (localtime): > Bezüglich Harry Schmalzbauer's Nachricht vom 30.06.2017 21:17 (localtime): >> Hello, >> >> on releng/11.1 I noticed a severe performace degradation during file >> unlinking in a FreeBSD guest. Host was running quiet recent stable/11 >> before. >> >> On the host, the vm is started with >> ahci,hd:/dev/adaN >> >> The guest attaches: >> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 >> ada0: ACS-2 ATA SATA 3.x device >> >> The geust has very high Sys-load during unlinking (50-75%@2 cores). >> Also, the host logs these errors: > Sorry, not the host logs these errors, but the guest, obviously. > After a reboot, I couldn't reproduce the timeouts, nor the completion time multiplication. It takes ~8 minutes like it alwas was… Sorry for late falsifying. I'm still wondering what these timeout error codes translate to. And I can still see very high CPU load in the guest during unlinking (obj-tree of usr/src). While systat reports less than 10irqs/s for AHCI0, "intr" consumes 30-800% at times (with 4 vCPUs). "bufdaemon" and "rm" consume 10-100% each. So total average is about 2 completely loaded cores for 'rm' running in a FreeBSD guest. Any hints regarding timeout codes or how to trace where CPU cycles drain are highly appreciated! Thanks, -harry