From owner-freebsd-ports@freebsd.org Sun Aug 14 07:22:55 2016 Return-Path: Delivered-To: freebsd-ports@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B374CBB98D2; Sun, 14 Aug 2016 07:22:55 +0000 (UTC) (envelope-from cgull@glup.org) Received: from glup.org (216-15-121-172.c3-0.smr-ubr2.sbo-smr.ma.static.cable.rcn.com [216.15.121.172]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 720001E0E; Sun, 14 Aug 2016 07:22:54 +0000 (UTC) (envelope-from cgull@glup.org) Received: from minipixel.i.glup.org (unknown [198.206.215.1]) by glup.org (Postfix) with ESMTPSA id 9756F854C4; Sun, 14 Aug 2016 03:22:47 -0400 (EDT) Authentication-Results: glup.org; dmarc=none header.from=glup.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=glup.org; s=201009; t=1471159367; bh=2Fx2IXzPr23mRksr94dMD/664JfAl71jxflyPsGWs5o=; h=Subject:To:References:From:Cc:Message-ID:Date:MIME-Version: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=HJsSI/7KeKHdbJbS17mJkjkGcEvJsTO8Rifsl1DFxtl9/QNBoKINCwVn+tQeXVlGz QncCsCBq6rsf2YnB+YHKNLEr8Z4DrzgR4gTSwawn+vfiAmMPoCoRC+tYXedIeMSztM druBy29wAFeVmXK30F/66CKJ6+tfn0VBwboY2Ewo58HvxmFyPkpyw8dNetkXpWN4Fr EsZ8u9Yo1QSIJ5g1d1in3p5Lfd5xi0T+2CWpon2/o2/9wfGkKonKsXwL/2Z8u9rLf9 xClrJ5Mt+PwzfjPIvQd1eD4wvj05GI5Wj4AJ5aet8E0TTTZgLAesm2QQtPPcEf68Ul guPdikaeew5mg== Subject: Re: Mosh regression between 10.x and 11-stable To: Peter Jeremy References: <20160810081831.GA65184@server.rulingia.com> <20160811101928.GC65184@server.rulingia.com> <68d0a6d4-2078-000c-6e22-b0b8721dfd2b@glup.org> <20160811195348.GD65184@server.rulingia.com> <971e1e82-1b95-d1e4-6053-a8548fc1d109@glup.org> <549fba5d-fb5e-ffe5-aec6-d03159628118@glup.org> <20160813083049.GF65184@server.rulingia.com> From: john hood Cc: mosh-devel@mit.edu, freebsd-current@freebsd.org, freebsd-ports@freebsd.org Message-ID: Date: Sun, 14 Aug 2016 03:22:47 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160813083049.GF65184@server.rulingia.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-ports@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Porting software to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2016 07:22:55 -0000 On 8/13/16 4:30 AM, Peter Jeremy wrote: > Hi John, > > Sorry, I got side-tracked. > > On 2016-Aug-12 16:37:15 -0400, John Hood wrote: >> >Could I ask you to look at this a little further? On the one hand, it >> >sure looks like a Mosh issue, and tcdrain() solves it-- but on the other >> >hand, this is a regression on FreeBSD and we don't see this issue on any >> >other OS. I'd like to fully understand this and make sure that this is >> >not a kernel issue. > It's got me puzzled as well. And it's only getting wierder... The > following is using an unmodified mosh-1.2.5, built from the port, as > the server on FreeBSD 11.0-BETA4 r303957. The client is 1.2.4a-1ubuntu1 > on Linux. the standard driver script consistently fails but adding a > "print" makes it mostly work. Where there's a "[mosh is exiting.]" > message, it was successful (and would report that there were other > orphaned servers since I wasn't waiting the 60 seconds for servers to > die between invocations). For completeness, I've tried ktrace'ing > mosh-server but can't make it fail when I do so. I've now managed to reproduce the issue *and* ktrace it (and sshd) on a VPS, with a single-CPU VM on a badly-oversubscribed VMWare host and OS X client. mosh-server shows a straightforward execution trace, the parent successfully writes the MOSH CONNECT message on stdout, forks and exits. About a millisecond later, the child starts running, writes the verbose copyright/etc. message on stderr, and gets EIO (and ignores it). Shortly thereafter it closes its pty slave fds on stdin/stdout/stderr. It continues normally from there. In that millisecond, the sshd trace shows that it catches SIGCHLD, writes utmp info, wait()s for a child and gets mosh-server's pid, and does a final read() that returns 0 bytes. It then closes the pty master, without doing revoke() or any ioctls. So the pty driver is clearly dropping the MOSH CONNECT message. It's a lot less clear whether that's a bug. On the one hand, states that output should be drained on final close (which in this case is done by the forked mosh-server). On the other hand, the login shell is session leader, and requires that the terminal be disassociated from the session when it exits. I suspect that this problem is most visible on single-core machines, because on a multi-core machine the pty driver, mosh-servers, and sshd will run with different ordering, and I suspect that the mosh-server child gets to close the pty slave before sshd closes the pty master. I wrote a Xenix serial driver ages ago and I can see arguments for either draining or dropping final output. And Mosh's behavior is a little questionable here. So I'm not sure whether to call this a kernel pty bug or not. But the workaround in Mosh is easy enough, tcdrain(), so I'm doing that anyway. regards, --jh