From owner-freebsd-net@freebsd.org Wed Aug 16 09:02:13 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4BF4DDCE872; Wed, 16 Aug 2017 09:02:13 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x241.google.com (mail-wm0-x241.google.com [IPv6:2a00:1450:400c:c09::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C855476A9E; Wed, 16 Aug 2017 09:02:12 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x241.google.com with SMTP id y206so4014178wmd.5; Wed, 16 Aug 2017 02:02:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=ucFkfzDR5UXtXK24WExu5AySLakUPR4msaDLF92jKDk=; b=pQpw3ynXErfFk2OMu6a33u7eM3XE5WZCMVpogIDeP9TqxYBbqOYmyDxoKxLRPpK04Q inc9HfovRvJasUslBlO0oXm7yVDml6229YbVh+ha6rBVUuqmuHthZbJ7/BCqSOebQig4 mEyalmAeLQ56U2r70GzfWSI7LrsBM3NDu2vx/nqCB4yE+SwruLZh+SnxFR5M9rhCd5wH WthQsf4oOwIqEPND2ea19nfvouJ/OKaWbqTU0yPLMOK3IUkybYMaVNJ/gfyxtDyMYFIX 2ik3TypwKBbjxQk2xFILo5dBkzP3UTTRlOgsDHXW8SIcqym2fM+UMf2kv+qqfCPMIrfL Y9pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=ucFkfzDR5UXtXK24WExu5AySLakUPR4msaDLF92jKDk=; b=OJ3zKyB2NBOzc/iJI7HHi5H8m00A9zZBz+Btjs3VPRBMfWXeZiykLcLJR/z9iH+4oO 8N72dB+MW6iR8olSgxv/Z2CcTbKEX8Aj9IJDgRBNHWtgM4K88PzT2zrFsKvKHfud3/jy 2vZjU2Cn7eG+giqQUpQYz7LxtmmB4ZWhn7pjMQUdt/FNEL7BYjD77hdXzLHJIkGbdy0O BccJeNzeEpBgkBjGrge0JOdPJU0H6MiIAV6fYzqoGtR4kXuHboQ5N7+Wci1hS0oKq9eu IcVSbuFC3z95PHm7pveAZoSfyFjmO4iYJuyAxERdIsZPLqRhyfZHrvYjG1MwfhWdSrt9 WmiQ== X-Gm-Message-State: AHYfb5h+waTLb2ruhthMZCQDyLPOQiM01xBWBvbV6Eyt7MkxNdTqFmdM DfAedj3KdxSLfKXi7iR3Qw== X-Received: by 10.28.189.68 with SMTP id n65mr803067wmf.15.1502874129300; Wed, 16 Aug 2017 02:02:09 -0700 (PDT) Received: from ben.home (LFbn-1-6951-179.w90-116.abo.wanadoo.fr. [90.116.132.179]) by smtp.gmail.com with ESMTPSA id c34sm425929wra.80.2017.08.16.02.02.07 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 16 Aug 2017 02:02:08 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???) From: Ben RUBSON In-Reply-To: <645f2ee3-3eaa-660e-2a64-37d53e88322f@freebsd.org> Date: Wed, 16 Aug 2017 11:02:06 +0200 Cc: Hans Petter Selasky , FreeBSD Net , hiren , Slawa Olhovchenkov , FreeBSD Stable Content-Transfer-Encoding: quoted-printable Message-Id: <13DE4E6D-CE83-4B5D-BF88-0EFE65111311@gmail.com> References: <9c306f10-7c05-d28d-e551-a930603aaafa@selasky.org> <896dd782-cb2c-0259-65d1-b00daae452de@FreeBSD.org> <0DB9F6FF-8BC9-48F5-B359-AC1905B9EB06@gmail.com> <7f14c95d-1ef8-bf82-c469-e6566c3aba66@selasky.org> <76A5EE7E-1D2E-46B4-86F1-F219C3DCE6EA@gmail.com> <4C91C6E5-0725-42E7-9813-1F3ACF3DDD6E@gmail.com> <5840c25e-7472-3276-6df9-1ed4183078ad@selasky.org> <2ADA8C57-2C2D-4F97-9F0B-82D53EDDC649@gmail.com> <061cdf72-6285-8239-5380-58d9d19a1ef7@selasky.org> <92BEE83D-498F-47D5-A53C-39DCDC00A0FD@gmail.com> <5d8960d8-e1ff-8719-320f-d3ae84054714@selasky.org> <6B4A35F7-5694-4945-9575-19ADB678F9FA@gmail.com> <297a784a-3d80-b1a6-652e-a78621fe5a8b@selasky.org> <3ECCFBF1-18D9-4E33-8F39-0C366C3BB8B4@gmail.com> <0a5787c5-8a53-ab09-971a-dc1cd5f3aca0@freebsd.org> <645f2ee3-3eaa-660e-2a64-37d53e88322f@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Aug 2017 09:02:13 -0000 > On 15 Aug 2017, at 23:33, Julien Charbon wrote: >=20 > On 8/11/17 11:32 AM, Ben RUBSON wrote: >>> On 08 Aug 2017, at 13:33, Julien Charbon wrote: >>>=20 >>> On 8/8/17 10:31 AM, Hans Petter Selasky wrote: >>>>=20 >>>> Suggested fix attached. >>>=20 >>> I agree we your conclusion. Just for the record, more precisely = this >>> regression seems to have been introduced with: >>> (...) >>> Thus good catch, and your patch looks good. I am going to just = verify >>> the other in_pcbrele_wlocked() calls in TCP stack. >>=20 >> Julien, do you plan to make this fix reach 11.0-p12 ? >=20 > I am checking if your issue is another flavor of the issue fixed by: >=20 > https://svnweb.freebsd.org/base?view=3Drevision&revision=3D307551 > https://reviews.freebsd.org/D8211 >=20 > This fix in not in 11.0 but in 11.1. Currently I did not found how an > inp in INP_TIMEWAIT state can have been INP_FREED without having its = tw > set to NULL already except the issue fixed by r307551. >=20 > Thus could you try to apply this patch: >=20 > = https://github.com/freebsd/freebsd/commit/acb5bfda99b753d9ead3529d04f20087= c5f7d0a0.patch >=20 > and see if you can still reproduce this issue? Thank you for your answer Julien. Unfortunately, I'm not sure at all how to reproduce the issue. I have other servers which are 100% identical to this one, same = workload, same some-months uptime, but they did not trigger the bug yet. If other network stack experts (I'm not) agree with your analysis, we could then certainly go further with D8211 / r307551. One thing that perhaps might help : # netstat -an | grep TIME_WAIT$ | wc -l 468 Note that due to this running bug, sendmail has lots of difficulties to = send outgoing mails. As soon as I run the above netstat command, I receive a lot of stacked = mails (more than 20 this time). As if netstat was able to somehow help... Number of TIME_WAIT connections however does not decrease, but = increases. > And in the spirit of r307551 fix and based on Hans patch I will also > propose to add a kernel log describing the issue instead of starting = an > infinite loop when INVARIANT is not set. Which should then never be triggered :) Good idea I think ! Thank you again ! Ben