Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 14 Aug 2016 03:22:47 -0400
From:      john hood <cgull@glup.org>
To:        Peter Jeremy <peter@rulingia.com>
Cc:        mosh-devel@mit.edu, freebsd-current@freebsd.org, freebsd-ports@freebsd.org
Subject:   Re: Mosh regression between 10.x and 11-stable
Message-ID:  <cded71fa-9970-d0c3-de95-6b2c1c46a734@glup.org>
In-Reply-To: <20160813083049.GF65184@server.rulingia.com>
References:  <20160810081831.GA65184@server.rulingia.com> <d577d1f9-d3ce-ee83-1051-d83d0c96591b@glup.org> <20160811101928.GC65184@server.rulingia.com> <68d0a6d4-2078-000c-6e22-b0b8721dfd2b@glup.org> <20160811195348.GD65184@server.rulingia.com> <971e1e82-1b95-d1e4-6053-a8548fc1d109@glup.org> <549fba5d-fb5e-ffe5-aec6-d03159628118@glup.org> <20160813083049.GF65184@server.rulingia.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 8/13/16 4:30 AM, Peter Jeremy wrote:
> Hi John,
> 
> Sorry, I got side-tracked.
> 
> On 2016-Aug-12 16:37:15 -0400, John Hood <cgull@glup.org> wrote:
>> >Could I ask you to look at this a little further?  On the one hand, it
>> >sure looks like a Mosh issue, and tcdrain() solves it-- but on the other
>> >hand, this is a regression on FreeBSD and we don't see this issue on any
>> >other OS.  I'd like to fully understand this and make sure that this is
>> >not a kernel issue.
> It's got me puzzled as well.  And it's only getting wierder...  The
> following is using an unmodified mosh-1.2.5, built from the port, as
> the server on FreeBSD 11.0-BETA4 r303957.  The client is 1.2.4a-1ubuntu1
> on Linux.  the standard driver script consistently fails but adding a
> "print" makes it mostly work.  Where there's a "[mosh is exiting.]"
> message, it was successful (and would report that there were other
> orphaned servers since I wasn't waiting the 60 seconds for servers to
> die between invocations).  For completeness, I've tried ktrace'ing
> mosh-server but can't make it fail when I do so.

I've now managed to reproduce the issue *and* ktrace it (and sshd) on a
VPS, with a single-CPU VM on a badly-oversubscribed VMWare host and OS X
client.

mosh-server shows a straightforward execution trace, the parent
successfully writes the MOSH CONNECT message on stdout, forks and exits.
 About a millisecond later, the child starts running, writes the verbose
copyright/etc. message on stderr, and gets EIO (and ignores it).
Shortly thereafter it closes its pty slave fds on stdin/stdout/stderr.
It continues normally from there.

In that millisecond, the sshd trace shows that it catches SIGCHLD,
writes utmp info, wait()s for a child and gets mosh-server's pid, and
does a final read() that returns 0 bytes.  It then closes the pty
master, without doing revoke() or any ioctls.

So the pty driver is clearly dropping the MOSH CONNECT message.  It's a
lot less clear whether that's a bug.  On the one hand,
<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap11.html#tag_11_01_11>;
states that output should be drained on final close (which in this case
is done by the forked mosh-server).  On the other hand, the login shell
is session leader, and
<http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap11.html#tag_11_01_03>;
requires that the terminal be disassociated from the session when it exits.

I suspect that this problem is most visible on single-core machines,
because on a multi-core machine the pty driver, mosh-servers, and sshd
will run with different ordering, and I suspect that the mosh-server
child gets to close the pty slave before sshd closes the pty master.

I wrote a Xenix serial driver ages ago and I can see arguments for
either draining or dropping final output.  And Mosh's behavior is a
little questionable here.  So I'm not sure whether to call this a kernel
pty bug or not.  But the workaround in Mosh is easy enough, tcdrain(),
so I'm doing that anyway.

regards,

  --jh




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?cded71fa-9970-d0c3-de95-6b2c1c46a734>