From owner-freebsd-arm@freebsd.org Mon Jun 12 13:31:27 2017 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 65997BFCB43 for ; Mon, 12 Jun 2017 13:31:27 +0000 (UTC) (envelope-from tvijlbrief@gmail.com) Received: from mail-yw0-x22f.google.com (mail-yw0-x22f.google.com [IPv6:2607:f8b0:4002:c05::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 209B66FB19 for ; Mon, 12 Jun 2017 13:31:27 +0000 (UTC) (envelope-from tvijlbrief@gmail.com) Received: by mail-yw0-x22f.google.com with SMTP id e142so27490215ywa.1 for ; Mon, 12 Jun 2017 06:31:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=v6zg0dmKGNf6Ha6ASpCCYw9WCQccg8aVBkxC7M844Ao=; b=PvUkmOLFPGWW+ziKfpmJgJjE4QFgjK/kdOSR7C76oI6+UHziHVKhIEyFJsY7bWOQ8t G5W4zlS1rviOMCESXgpcKNv7ouHGN7DLwFKe4ckam2yj0ZbYyoY3/woFS+C/whutfo4q rYBPNESAz0SfaGnluF9Brt4YMPhYa9vJ/+ExyF6WLoOwarGxMHzaZR0qgIQWBFPjAgUD ZY7v7RALHUmQbv9mkPDgL0zjzgpo7Zw9Xkk+9au3eLPsAHO93rQv3cYz0H1y/N1H9W/v 3LsiquOVaDTwqiRPb+KU/7TS8rzeMh1zoJRc2yLn12mzfJ0gr4IRSQ/0m0sxbtdnQB0k xq+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=v6zg0dmKGNf6Ha6ASpCCYw9WCQccg8aVBkxC7M844Ao=; b=IK4z0VGP2nb1HTMPUDkAMh8PEWKtsiLoHltwjvKxZqGuGtuAGUSIR3ia9wnuh5Mliu ZOrngQ7JI4BMvxaU/rYecmrH6X7gw3pS1LlkPYDZh/RVThytlprkgilTIedimiYVXSuO QhFCqhg7ylauTztbd9UouYdM8L058Tor6iToiZVNvL/GPDyAulEwaLIvobGnecACeLj3 xQVxuqEFOKCLSPPJG1yIxsCGpWz9f2fyq0JW7ZFGlZfRnhq22A23gMlQrKgqVC6qCyAG uXV0zM2A8NKdJ1pfStZKq7Rk+KS2TB9c5YAmz1tQVf592oNc7NPCa4v213gZkVTbzKog ragA== X-Gm-Message-State: AODbwcAkYGV5s9JlxHzNAhEvagAumJLAaaidHU9rAgqOpmTvVgrnbWFt i55oxQ/FXtoXg0j6nLIp66h3Mlpw1g== X-Received: by 10.129.76.14 with SMTP id z14mr24355294ywa.20.1497274286276; Mon, 12 Jun 2017 06:31:26 -0700 (PDT) MIME-Version: 1.0 References: <158994e0-9f53-e64a-2b81-b554894571c6@restart.be> In-Reply-To: From: Tom Vijlbrief Date: Mon, 12 Jun 2017 13:31:15 +0000 Message-ID: Subject: Re: [Bug 219927] awg0 stops working after a long output under ssh To: Henri Hennebert , freebsd-arm Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2017 13:31:27 -0000 Tested with TX_MAG_SEGS at 20 and that is also stable for me, so I added a patch to the original bug report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219927 The only downside I can see is a modest increase in kernel stack usage: Index: sys/arm/allwinner/if_awg.c =================================================================== --- sys/arm/allwinner/if_awg.c (revision 319826) +++ sys/arm/allwinner/if_awg.c (working copy) @@ -92,7 +92,7 @@ #define TX_SKIP(n, o) (((n) + (o)) & (TX_DESC_COUNT - 1)) #define RX_NEXT(n) (((n) + 1) & (RX_DESC_COUNT - 1)) -#define TX_MAX_SEGS 10 +#define TX_MAX_SEGS 20 #define SOFT_RST_RETRY 1000 #define MII_BUSY_RETRY 1000 @@ -419,14 +419,18 @@ sc->tx.buf_map[index].map, m, segs, &nsegs, BUS_DMA_NOWAIT); if (error == EFBIG) { m = m_collapse(m, M_NOWAIT, TX_MAX_SEGS); - if (m == NULL) + if (m == NULL) { + device_printf(sc->miibus, "awg_setup_txbuf: m_collapse failed\n"); return (0); + } *mp = m; error = bus_dmamap_load_mbuf_sg(sc->tx.buf_tag, sc->tx.buf_map[index].map, m, segs, &nsegs, BUS_DMA_NOWAIT); } - if (error != 0) + if (error != 0) { + device_printf(sc->miibus, "awg_setup_txbuf: bus_dmamap_load_mbuf_sg failed\n"); return (0); + } bus_dmamap_sync(sc->tx.buf_tag, sc->tx.buf_map[index].map, BUS_DMASYNC_PREWRITE); Op ma 12 jun. 2017 om 10:47 schreef Tom Vijlbrief : > > > Op ma 12 jun. 2017 09:59 schreef Henri Hennebert : > >> On 06/11/2017 17:54, Tom Vijlbrief wrote: >> > >> > Op zo 11 jun. 2017 om 16:23 schreef > > >: >> > >> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219927 >> > >> > Bug ID: 219927 >> > Summary: awg0 stops working after a long output under >> ssh >> > Product: Base System >> > Version: CURRENT >> > Hardware: arm64 >> > OS: Any >> > Status: New >> > Severity: Affects Only Me >> > Priority: --- >> > Component: arm >> > Assignee: freebsd-arm@FreeBSD.org >> > Reporter: hlh@restart.be >> > >> > Environment: pine64+ 2GB >> > FreeBSD norquay.restart.bel 12.0-CURRENT FreeBSD 12.0-CURRENT #0 >> > r318945M: Sat >> > Jun 10 11:47:44 CEST 2017 >> > root@norquay.restart.bel:/usr/obj/usr/src/sys/NORQUAY arm64 >> > >> > If I connect from a wireless computer (FreeBSD 11.1-PRERELEASE #0 >> > r318860) and >> > run a command with a big output (eg `find /`) the awg0 stops working >> > quickly >> > (under 20 seconds of output). >> > >> > If I do the same with telnet from the same computer, the output is >> > much longer >> > but awg0 stops working. >> > >> > If I do the same from a wired computer then I must run `find /` 2 or >> > 3 times >> > before awg0 stops working. >> > >> > I can rsync through ssh 12GB without problem in both directions >> > (from and to >> > the pine64 and the wireless computer). >> > >> > I have a `tcpdump -w ssh.data port 22`. (8.3 MB) >> > >> > I can connect with a serial console to the pine64 after awg0 stop >> > working. >> > ifconfig awg0 down >> > ifconfig awg0 up >> > don't restore the connectivity. I must reboot to restore >> connectvity. >> > >> > >> > That's a coincidence, today I'm investigating the same issue. >> > >> > You could try increasing TX_MAX_SEGS in sys/arm/allwinner/if_awg.c >> line 95. >> > >> > I'm currently testing TX_MAX_SEGS set to 40 and no lock up yet.... >> >> Bingo. Your solution solved the problem. >> >> Thanks a lot. >> > > Good to hear! > > Increasing from 10 to 20 is probably sufficient. It is not clear to me > what the adverse effects are of a too high value. > > The root cause is that the driver tries to call m_collapse with this limit > and this will fail. The tcp stack will resent the package and the > m_collapse will fail again and again and ... > >