From owner-freebsd-net@freebsd.org Tue Aug 8 12:00:46 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E2429DD79B9; Tue, 8 Aug 2017 12:00:46 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wr0-f195.google.com (mail-wr0-f195.google.com [209.85.128.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 763228367A; Tue, 8 Aug 2017 12:00:46 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mail-wr0-f195.google.com with SMTP id g32so2267016wrd.5; Tue, 08 Aug 2017 05:00:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=HMGu62aG5xnbzGPlxvUFWp4spkuQo/iym+Ss38zM/5M=; b=GeWsymHpGTmWhQ05xJGImrm+7MqsI9ozqJO9nFIj6p798VtSdv0QhRHdhkla3NriAc jCOM2gxxXphzf2URaduGPbeH+tKheU+WQWpDFoE25zqUU9wombxAchYjYx7uFNK7GHYu 3MXRk3htGGXWf+2YdyfASLRnayNG/BKVX5AUoeWkMBXoJZdIrwntgcqGPmYPOuE/aOUN WaNd7Z6D4/E6PChn/iUO0SwYQYsTjhT9ca9Z4pUQ5Y2a/b/dnLeIzNM2+2rdUDYuGTMA aUiT0N6BuY/+HJloW7SaFJzQ6Ds6SetjN/mGLCliUUHFD2Bar2+pq4UrbUvlzmqz8wdE a8Jw== X-Gm-Message-State: AHYfb5ghuz2qlJAEHcj/5GBGYW99xf97gAs2QKajlE3WQan/lsrnuGLb +jtleA7L2vCmlOU01z8= X-Received: by 10.223.177.202 with SMTP id r10mr2601657wra.36.1502192025945; Tue, 08 Aug 2017 04:33:45 -0700 (PDT) Received: from [10.100.64.12] ([217.30.88.7]) by smtp.gmail.com with ESMTPSA id g93sm1533468wrd.11.2017.08.08.04.33.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Aug 2017 04:33:45 -0700 (PDT) Subject: Re: mlx4en, timer irq @100%... (11.0 stuck on high network load ???) To: Hans Petter Selasky , Ben RUBSON , FreeBSD Net , hiren , Slawa Olhovchenkov , FreeBSD Stable References: <4DF74CB8-23D2-4CCF-B699-5B86DAEA65E5@gmail.com> <40602CEA-D417-4E5B-8C68-916958D49A0B@gmail.com> <9c306f10-7c05-d28d-e551-a930603aaafa@selasky.org> <896dd782-cb2c-0259-65d1-b00daae452de@FreeBSD.org> <0DB9F6FF-8BC9-48F5-B359-AC1905B9EB06@gmail.com> <7f14c95d-1ef8-bf82-c469-e6566c3aba66@selasky.org> <76A5EE7E-1D2E-46B4-86F1-F219C3DCE6EA@gmail.com> <4C91C6E5-0725-42E7-9813-1F3ACF3DDD6E@gmail.com> <5840c25e-7472-3276-6df9-1ed4183078ad@selasky.org> <2ADA8C57-2C2D-4F97-9F0B-82D53EDDC649@gmail.com> <061cdf72-6285-8239-5380-58d9d19a1ef7@selasky.org> <92BEE83D-498F-47D5-A53C-39DCDC00A0FD@gmail.com> <5d8960d8-e1ff-8719-320f-d3ae84054714@selasky.org> <6B4A35F7-5694-4945-9575-19ADB678F9FA@gmail.com> <297a784a-3d80-b1a6-652e-a78621fe5a8b@selasky.org> <3ECCFBF1-18D9-4E33-8F39-0C366C3BB8B4@gmail.com> From: Julien Charbon Message-ID: <0a5787c5-8a53-ab09-971a-dc1cd5f3aca0@freebsd.org> Date: Tue, 8 Aug 2017 13:33:43 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Aug 2017 12:00:47 -0000 Hi, On 8/8/17 10:31 AM, Hans Petter Selasky wrote: > On 08/08/17 10:06, Ben RUBSON wrote: >>> On 08 Aug 2017, at 10:02, Hans Petter Selasky wrote: >>> >>> On 08/08/17 10:00, Ben RUBSON wrote: >>>> kgdb) print *twq_2msl.tqh_first >>>> $2 = { >>>> tw_inpcb = 0xfffff8031c570740, >>> >>> print *twq_2msl.tqh_first->tw_inpcb >> > > Here is the conclusion: > > The following code is going in an infinite loop: > > >> for (;;) { >> TW_RLOCK(V_tw_lock); >> tw = TAILQ_FIRST(&V_twq_2msl); >> if (tw == NULL || (!reuse && (tw->tw_time - ticks) > >> 0)) { >> TW_RUNLOCK(V_tw_lock); >> break; >> } >> KASSERT(tw->tw_inpcb != NULL, ("%s: tw->tw_inpcb == >> NULL", >> __func__)); >> >> inp = tw->tw_inpcb; >> in_pcbref(inp); >> TW_RUNLOCK(V_tw_lock); >> >> if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) { >> >> INP_WLOCK(inp); >> tw = intotw(inp); >> if (in_pcbrele_wlocked(inp)) { > > in_pcbrele_wlocked() returns (1) because INP_FREED (16) is set in > inp->inp_flags2. I guess you have invariants disabled, because the > KASSERT() below should have caused a panic. > >> KASSERT(tw == NULL, ("%s: held last inp " >> "reference but tw not NULL", >> __func__)); >> INP_INFO_RUNLOCK(&V_tcbinfo); >> continue; >> } > > This is a regression issue after: > >> commit 5630210a7f1dbbd903b77b2aef939cd47c63da58 >> Author: jch >> Date: Thu Oct 30 08:53:56 2014 +0000 >> >> Fix a race condition in TCP timewait between tcp_tw_2msl_reuse() and >> tcp_tw_2msl_scan(). This race condition drives unplanned timewait >> timeout cancellation. Also simplify implementation by holding inpcb >> reference and removing tcptw reference counting. > > Suggested fix attached. I agree we your conclusion. Just for the record, more precisely this regression seems to have been introduced with: commit b02d40ddcda08b51a49e5667e6808f5dc5ec0472 Author: fabient Date: Wed Nov 25 14:45:43 2015 +0000 The r241129 description was wrong that the scenario is possible only for read locks on pcbs. The same race can happen with write lock semantics as well. The race scenario: - Two threads (1 and 2) locate pcb with writer semantics (INPLOOKUP_WLOCKPCB) and do in_pcbref() on it. - 1 and 2 both drop the inp hash lock. - Another thread (3) grabs the inp hash lock. Then it runs in_pcbfree(), which wlocks the pcb. They must happen faster than 1 or 2 come INP_WLOCK()! - 1 and 2 congest in INP_WLOCK(). - 3 does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which doesn't free the pcb due to two references on it. Then it unlocks the pcb. - 1 (or 2) gets wlock on the pcb, runs in_pcbrele_wlocked(), which doesn't report inp as freed, due to 2 (or 1) still helding extra reference on it. The thread tries to do smth with a disconnected pcb and crashes. Submitted by: emeric.poupon@stormshield.eu Reviewed by: gleb@ MFC after: 1 week Sponsored by: Stormshield Tested by: Cassiano Peixoto, Stormshield Notes: svn path=/head/; revision=291301 Before this change in_pcbrele_wlocked() returned 1 only on the last reference which is what tcp_tw_2msl_scan() currently expects. Thus good catch, and your patch looks good. I am going to just verify the other in_pcbrele_wlocked() calls in TCP stack. Thanks. -- Julien