From owner-freebsd-current@freebsd.org Wed Dec 30 15:41:26 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id D4EE24C63C3 for ; Wed, 30 Dec 2020 15:41:26 +0000 (UTC) (envelope-from warlock@phouka1.phouka.net) Received: from phouka1.phouka.net (phouka1.phouka.net [107.170.196.116]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "phouka.net", Issuer "Go Daddy Secure Certificate Authority - G2" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4D5b9d5PXRz3nKF for ; Wed, 30 Dec 2020 15:41:25 +0000 (UTC) (envelope-from warlock@phouka1.phouka.net) Received: from phouka1.phouka.net (localhost [127.0.0.1]) by phouka1.phouka.net (8.16.1/8.16.1) with ESMTPS id 0BUFe26e045987 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Wed, 30 Dec 2020 07:40:02 -0800 (PST) (envelope-from warlock@phouka1.phouka.net) Received: (from warlock@localhost) by phouka1.phouka.net (8.16.1/8.16.1/Submit) id 0BUFe2EL045986; Wed, 30 Dec 2020 07:40:02 -0800 (PST) (envelope-from warlock) Date: Wed, 30 Dec 2020 07:40:01 -0800 From: John Kennedy To: "Hartmann, O." Cc: FreeBSD CURRENT Subject: Re: # Fssh_packet_write_wait: Connection to 77.183.250.3 port 22: Broken pipe Message-ID: References: <20201230080403.5474da7c@hermann.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201230080403.5474da7c@hermann.fritz.box> X-Rspamd-Queue-Id: 4D5b9d5PXRz3nKF X-Spamd-Bar: / X-Spamd-Result: default: False [0.20 / 15.00]; ARC_NA(0.00)[]; MAILMAN_DEST(0.00)[freebsd-current]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[107.170.196.116:from]; NEURAL_SPAM_SHORT(1.00)[1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[phouka.net]; AUTH_NA(1.00)[]; SPAMHAUS_ZRD(0.00)[107.170.196.116:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; R_SPF_NA(0.00)[no SPF record]; FORGED_SENDER(0.30)[warlock@phouka.net,warlock@phouka1.phouka.net]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; FROM_NEQ_ENVFROM(0.00)[warlock@phouka.net,warlock@phouka1.phouka.net]; ASN(0.00)[asn:14061, ipnet:107.170.192.0/18, country:US] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Dec 2020 15:41:26 -0000 On Wed, Dec 30, 2020 at 08:04:03AM +0100, Hartmann, O. wrote: > On recent 12-STABLE, 12.1-RELENG and 12.2-RELENG I face a very nasty problem which > occured a while ago after it seemed to have vanished for a while: running ssh in a xterm > on FreeBSD boxes as mentioned at the beginning ends up very rapidly in a lost connection > with > > # Fssh_packet_write_wait: Connection to XXX.XXX.XXX.XXX port 22: Broken pipe > > The backend is in most cases a CURRENT, 12.1-RELENG or 12.2-RELENG or 12-STABLE server. A > couple of months ago we moved from 11.3-RELENG to 12.1-RELENG (server side, clients were > always 13-CURRENT or 12-STABLE). With FreeBSD 11 as the backend, those broken pipes > occured, but not that frequent and rapid as it is the fact now. > > The "problem" can be mitigated somehow: running top or using the console prevents the > broken pipe fault for a while, but it still occurs. Running "screen" (port > sysutils/screen) does extend the usability of the console for a significant timespan, but > the broken pipe also occurs randomly, but it takes a significant time to occur. So, I do a LOT of ssh-in-xterm and I can't say that I've seen anything that looks like it is FreeBSD's fault (vs ISP, work firewall, work VPN, etc). For my cloud host (12.2-p2) I do tend to use the screen program. At work, in pre- Covid times (so up to last March 18th or so, whatever that works out to in versioning/revisions; probably 12.1 or 12.0), I'd have sessions opened a week+. At home I'm all 13 at the moment. Because I'm running a lot of 13 at home (and before that, 12-stable) I tend to reboot the box for update reasons. Is it safe to assume that "very rapidly" is measured in sub-days? > My conclusion is: either there is a serious problem with FreeBSD since 12, or there is a > config issue I'm not aware of, even with "vanilla" installations from official repository > running unchanged. At work, my problems are all about crappy firewalls. Even firewalls that we've spent a LOT of money on (PaloAlto, the Juniper before it). In all fairness to them, we're running a University's worth of class-B through there and they have all the state-tracking/deep-inspection goodness turned on trying to protect everyone from the big bad internet so it's complicated. With putty, I've had to turn on TCP/IP keepalives and sending null packets. The problem there just seems to be that the firewall hardware can only track so many sessions and, when you stress it, it'll drop "idle" sessions (vs active, vs not opening up a new one). Systems hemorrhage connections all the time when something eats the final connection-close packet, but they can time the thing out. The PaloAlto in my case doesn't know that so it just starts reaping, getting valid idle connections some of the time. So all my tricks just involve some amount of traffic to keep that session more alive in the non-host-state-tracker's brain. For SSH at work, I've set this up: host * TCPKeepAlive yes ServerAliveInterval 60 ServerAliveCountMax 3 So, send TCP/IP keepalive packets, send some traffic every 60 seconds, and tear down the session if you miss 3 of those. I'll note at home that I haven't had to do that. For that cloud 12.2 system, I've had a connection "idle" for 21 hours (but running with a screen going, which is getting some amount of bidirectional traffic going because it has a date/time stamp that gets updated once a minute). Is 21 hours "significant" by your measurements? At home, I don't have a network firewall of any sort. Probably the usual unknowns with the ISP and crappyware NAT box they force me to use. My cloud system is running on DigitalOcean, for what that is worth. I'm not sure what they're doing for firewalls (I'm doing host firewalls out there, so maybe nothing in my case).