From nobody Tue Jun 8 20:47:25 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 08F5D5D5CF7 for ; Tue, 8 Jun 2021 20:47:47 +0000 (UTC) (envelope-from freebsd@grem.de) Received: from mail.evolve.de (mail.evolve.de [213.239.217.29]) (using TLSv1.3 with cipher TLS_CHACHA20_POLY1305_SHA256 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA512 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mail.evolve.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4G02PF4JKlz4lFx for ; Tue, 8 Jun 2021 20:47:44 +0000 (UTC) (envelope-from freebsd@grem.de) Received: by mail.evolve.de (OpenSMTPD) with ESMTP id 5d533339 for ; Tue, 8 Jun 2021 20:47:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=grem.de; h=date:from:to :subject:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; s=20180501; bh=6VE12wVQ naekFVj7igYIX/Vt3J8=; b=RCa6S1zxFW6YLaYZWBuYSNu85fqNlOlQaZz62KV8 ovJn7VShnmUouUT1yzCOl/R6OJrK2bMdS51gVGoZT3hF4Afyz8SknpIDwHqkIhso BLk4E97njAFpOAAnlA2pdtDyjzYjQppuK24ajlTWPRkNSHxLsmZ9CfRRsgN1c4Br PZlTPjF3X+RwuAwT0cmjCcu2uA+lWM0FlSie34RDT53mFDKiIErMvX92iAkgWAd+ /rtL6CtckQw0f4pSDhKAADi5GM9H/iNcR526rEhKgL0mcRwIj+Vf9ZQCVDlamutx jZPs3XVLk+Sh+nx0sS+kzTPvo6d9tOkFUVEiNZb/t8XJvw== DomainKey-Signature: a=rsa-sha1; c=nofws; d=grem.de; h=date:from:to :subject:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=20180501; b=Si MlhX+B/tJAYuhxdV26U7ApJOk9XFZRJQEiq6pjKry1VL/VWhNPenWAGEztqOUi+G R6AUqfA6avdTgqpMq+WsfftACUpKDJpXaI2KvyKCTa9WhaXa90cPkIDZYBrWhTtw U0z+NpFbuxwzWKKMCrZrGbaBcZmk+OYSTY/QNyfbmmhqSAeVSENKuwLhl8W0ozp9 +uL8/ig4CO3Jv3RD1M0xE2tsQ6S2i7rhazfpGbMWzWQ4hg6pVVXD9ZKHn898+HiU 7OrTHO2WcPjY/lvL0bY0W3qsEHEOl4WhhZZWdzKrEHJJmmeI0cEEH15aiWtXOhdy JmCi0et/tm5g2y2du0vw== Received: by mail.evolve.de (OpenSMTPD) with ESMTPSA id 47f7c3bb (TLSv1.3:AEAD-CHACHA20-POLY1305-SHA256:256:NO) for ; Tue, 8 Jun 2021 20:47:38 +0000 (UTC) Date: Tue, 8 Jun 2021 22:47:25 +0200 From: Michael Gmelin To: "freebsd-current@freebsd.org" Subject: Re: ssh connections break with "Fssh_packet_write_wait" on 13 [SOLVED] Message-ID: <20210608224725.35930d70@bsd64.grem.de> In-Reply-To: <20210603150906.48cbd638@bsd64.grem.de> References: <20210601134747.40920d51@bsd64.grem.de> <20210603150906.48cbd638@bsd64.grem.de> X-Face: $wrgCtfdVw_H9WAY?S&9+/F"!41z'L$uo*WzT8miX?kZ~W~Lr5W7v?j0Sde\mwB&/ypo^}> +a'4xMc^^KroE~+v^&^#[B">soBo1y6(TW6#UZiC]o>C6`ej+i Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWJBwe5BQDl LASZU0/LTEWEfHbyj0Txi32+sKrp1Mv944X8/fm1rS+cAAAACXBIWXMAAAsTAAAL EwEAmpwYAAAAB3RJTUUH3wESCxwC7OBhbgAAACFpVFh0Q29tbWVudAAAAAAAQ3Jl YXRlZCB3aXRoIFRoZSBHSU1QbbCXAAAAAghJREFUOMu11DFvEzEUAGCfEhBVFzuq AKkLd0O6VrIQsLXVSZXoWE5N1K3DobBBA9fQpRWc8OkWouaIjedWKiyREOKs+3PY fvalCNjgLVHeF7/3bMtBzV8C/VsQ8tecEgCcDgrzjekwKZ7TwsJZd/ywEKwwP+ZM 8P3drTsAwWn2mpWuDDuYiK1bFs6De0KUUFw0tWxm+D4AIhuuvZqtyWYeO7jQ4Aea 7jUqI+ixhQoHex4WshEvSXdood7stlv4oSuFOC4tqGcr0NjEqXgV4mMJO38nld4+ xKNxRDon7khyKVqY7YR4d+Cg0OMrkWXZOM7YDkEfKiilCn1qYv4mighZiynuHHOA Wq9QJq+BIES7lMFUtcikMnkDGHUoncA+uHgrP0ctIEqfwLHzeSo+eUA66AqzwN6n 2ZHJhw6Qh/PoyC/QENyEyC/AyNjq74Bs+3UH0xYwzDUC4B97HgLocg1QLYgDDO1v f3UX9Y307Ew4AHh67YAFFsxEpkXwpXY3eIgMhAAE3R19L919nNnuD2wlPcDE3UeT L2ytEICQib9BXgS2fU8PrD82ToYO1OEmMSnYTjSqSv9wdC0tPYC+rQRQD9ESnldF CyqfmiYW+tlALt8gH2xrMdC/youbjzPXEun+/ReXsMCDyve3dZc09fn2Oas8oXGc Jj6/fOeK5UmSMPmf/jL+GD8BEj0k/Fn6IO4AAAAASUVORK5CYII= List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4G02PF4JKlz4lFx X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=grem.de header.s=20180501 header.b=RCa6S1zx; dmarc=none; spf=pass (mx1.freebsd.org: domain of freebsd@grem.de designates 213.239.217.29 as permitted sender) smtp.mailfrom=freebsd@grem.de X-Spamd-Result: default: False [-1.60 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[grem.de:s=20180501]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:213.239.217.29/32]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; DMARC_NA(0.00)[grem.de]; RCPT_COUNT_ONE(0.00)[1]; SPAMHAUS_ZRD(0.00)[213.239.217.29:from:127.0.2.255]; RCVD_COUNT_THREE(0.00)[3]; NEURAL_SPAM_SHORT(0.90)[0.905]; DKIM_TRACE(0.00)[grem.de:+]; TO_DN_EQ_ADDR_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RBL_DBL_DONT_QUERY_IPS(0.00)[213.239.217.29:from]; ASN(0.00)[asn:24940, ipnet:213.239.192.0/18, country:DE]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-current] X-ThisMailContainsUnwantedMimeParts: N On Thu, 3 Jun 2021 15:09:06 +0200 Michael Gmelin wrote: > On Tue, 1 Jun 2021 13:47:47 +0200 > Michael Gmelin wrote: > > > Hi, > > > > Since upgrading servers from 12.2 to 13.0, I get > > > > Fssh_packet_write_wait: Connection to 1.2.3.4 port 22: Broken pipe > > > > consistently, usually after about 11 idle minutes, that's with and > > without pf enabled. Client (11.4 in a VM) wasn't altered. > > > > Verbose logging (client and server side) doesn't show anything > > special when the connection breaks. In the past, QoS problems > > caused these disconnects, but I didn't see anything apparent > > changing between 12.2 and 13 in this respect. > > > > I did a test on a newly commissioned server to rule out other > > factors (so, same client connections, some routes, same > > everything). On 12.2 before the update: Connection stays open for > > hours. After the update (same server): connections breaks > > consistently after < 15 minutes (this is with unaltered > > configurations, no *AliveInterval configured on either side of the > > connection). > > I did a little bit more testing and realized that the problem goes > away when I disable "Proportional Rate Reduction per RFC 6937" on the > server side: > > sysctl net.inet.tcp.do_prr=0 > > Keeping it on and enabling net.inet.tcp.do_prr_conservative doesn't > fix the problem. > > This seems to be specific to Parallels. After some more digging, I > realized that Parallels Desktop's NAT daemon (prl_naptd) handles > keep-alive between the VM and the external server on its own. There is > no direct communication between the client and the server. This means: > > - The NAT daemon starts sending keep-alive packages right away (not > after the VM's net.inet.tcp.keepidle), every 75 seconds. > - Keep-alive packages originating in the VM never reach the server. > - Keep-alive originating on the server never reaches the VM. > - Client and server basically do keep-alive with the nat daemon, not > with each other. > > It also seems like Parallels is filtering the tos field (so it's > always 0x00), but that's unrelated. > > I configured a bhyve VM running FreeBSD 11.4 on a separate laptop on > the same network for comparison and is has no such issues. > > Looking at TCP dump output on the server, this is what a keep-alive > package sent by Parallels looks like: > > 10:14:42.449681 IP (tos 0x0, ttl 64, id 15689, offset 0, flags > [none], proto TCP (6), length 40) > 192.168.1.1.58222 > 192.168.1.2.22: Flags [.], cksum x (correct), > seq 2534, ack 3851, win 4096, length 0 > > While those originating from the bhyve VM (after lowering > net.inet.tcp.keepidle) look like this: > > 12:18:43.105460 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], > proto TCP (6), length 52) > 192.168.1.3.57555 > 192.168.1.2.22: Flags [.], cksum x > (correct), seq 1780337696, ack 45831723, win 1026, options > [nop,nop,TS val 3003646737 ecr 3331923346], length 0 > > Like written above, once net.inet.tcp.do_prr is disabled, keepalive > seems to be working just fine. Otherwise, Parallel's NAT daemon kills > the connection, as its keep-alive requests are not answered (well, > that's what I think is happening): > > 10:19:43.614803 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], > proto TCP (6), length 40) > 192.168.1.1.58222 > 192.168.1.2.22: Flags [R.], cksum x (correct), > seq 2535, ack 3851, win 4096, length 0 > > The easiest way to work around the problem Client side is to configure > ServerAliveInterval in ~/.ssh/config in the Client VM. > > I'm curious though if this is basically a Parallels problem that has > only been exposed by PRR being more correct (which is what I suspect), > or if this is actually a FreeBSD problem. > So, PRR probably was a red herring and the real reason that's happening is that FreeBSD (since version 13[0]) by default discards packets without timestamps for connections that formally had negotiated to have them. This new behavior seems to be in line with RFC 7323, section 3.2[1]: "Once TSopt has been successfully negotiated, that is both and contain TSopt, the TSopt MUST be sent in every non- segment for the duration of the connection, and SHOULD be sent in an segment (see Section 5.2 for details)." As it turns out, macOS does exactly this - send keep-alive packets without a timestamp for connections that were negotiated to have them. Under normal circumstances - ssh from macOS to a server running FreeBSD 13 - this won't be noticed, since macOS uses the same default settings as FreeBSD (2 hours idle time, 75 seconds intervals), so the server side initiated keep-alive will save the connection before it has a chance to break due to eight consecutive unanswered keep-alives at the client side. This is different for ssh connections originating from a VM inside Parallels, as connections created by prl_naptd will start sending tcp keep-alives shortly after the connection becomes idle. As a result, idle connections break after about 11 minutes of idle time (60s + 8*75s = 660s == 11m), unless countermeasures are taken. An easy way to demonstrate the problem is to change keep-alive defaults on *macOS* using sysctl and sshing to a FreeBSD 13 server: $ sudo sysctl net.inet.tcp.keepidle=5000 $ sudo sysctl net.inet.tcp.keepintvl=5000 $ ssh -oTCPKeepAlive=yes myserver This way, the problem described can be reproduced quite easily: Disconnect due to broken pipe after 45-60 seconds of idle time, tcpdump confirming that keep-alive packets don't have tcp timestamps, while they were used when negotiating the connection. There are various ways to work around the issue. Client side workarounds: - Use ServerAlive* settings in ~/.ssh/config (ssh only) - Tune net.inet.tcp.keep* sysctls on macOS (for all services) Server side workarounds: - Use ClientAlive* settings in ~/.ssh/config (ssh only) - Tolerate missing timestamps in packets using sysctl, which makes FreeBSD 13 behave like previous versions did: sysctl net.inet.tcp.tolerate_missing_ts=1 The last option probably being the most practical one. rscheff@ and tuexen@ (thank you!) were able to reproduce the issue and reached out to Apple to see if there is something they can do to fix this at their end (macOS) in the future. Best Michael [0]https://cgit.freebsd.org/src/commit/?id=283c76c7c3f2f634f19f303a771a3f81fe890cab [1]https://datatracker.ietf.org/doc/html/rfc7323#section-3.2 -- Michael Gmelin