From owner-freebsd-stable@FreeBSD.ORG Tue Dec 4 11:22:58 2007 Return-Path: Delivered-To: stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 523E316A419; Tue, 4 Dec 2007 11:22:58 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0820113C457; Tue, 4 Dec 2007 11:22:57 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 5718446B43; Tue, 4 Dec 2007 06:27:45 -0500 (EST) Date: Tue, 4 Dec 2007 11:22:49 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ed Schouten In-Reply-To: <20071204102328.GK72574@hoeg.nl> Message-ID: <20071204111050.W30376@fledge.watson.org> References: <20071203225800.S30376@fledge.watson.org> <20071204102328.GK72574@hoeg.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: stable@FreeBSD.org, current@FreeBSD.org Subject: Re: Attention 7.x and 8.x ptmx/pts users (read if you set kern.pts.enable=1) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Dec 2007 11:22:58 -0000 On Tue, 4 Dec 2007, Ed Schouten wrote: > * Robert Watson wrote: >> Unfortunately, the current implementation is subject to a potential >> resource leak: the pty is created when the lookup occurs, but if the open >> never takes place, then the pty is leaked. In principle, we have >> facilities to GC unused device nodes "eventually", although not a race-free >> way to determine that no race occurs, assuming that we implemented that. >> This leakage turns out to interact particularly poorly with our resource >> limits on pty/pts pairs -- both the administrative limit imposed by sysctl >> and also the functional limit on the number of entries in /etc/ttys. It's >> possible to imagine various sometimes messy techniques of performing this >> garbage collection. > > So this is the same issue I sent a message to arch@ about some time ago, > that /dev/ptmx already returns a reference to the new pty, already when you > stat(2) it (for example by running `ls -l /dev/ptmx')? Yes. There's also another known issue, likely not corrected by this patch, in which closing the pty before the pts fails to properly wake up processes hung off the pts and inform them of its impending doom, resulting in the pty/pts pair never being garbage-collected. I've not tracked this down yet, but you can reproduce it by running screen(1) and then "killing" a screen. screen(1) closes the pty and relies on the pty/pts mechanism to do the rest, which doesn't. >> Instead, what I'd like to do is modify the ptmx code to have a race-free >> protocol, in which eventual termination of processes referencing the node >> results in freeing of the nodes. On some systems, ptmx performs a >> "bait-and-switch", in which the file descriptor of the pty node is silently >> substituted for the file descriptor of the ptmx code--similar to our model, >> only no window between lookup and open, but also not easily supported in >> our current VFS. Another possibility is to introduce a new system call and >> bypass ptmx entirely -- similar to pipe(), socketpair(), etc. > > I actually think that this sounds pretty nice. You mean something like an > in-kernel implementation for openpty()? Yes -- this is something John Baldwin has suggested, but I admit to having some reservations. As I see it, the choice really comes down to complexity -- because the implementation of pts is already very complicated, we want to pick the easiest implementation to understand and maintain. Unfortunately, I'm not sure I see adding a system call as necessarily easier -- you have to create the pts device node, because ttys require a device node on which applications can operate -- to adjust echo, to query speed, program control key behavior, write for the purposes of write(1), talk(1), etc. This means that somehow, we need to instantiate the pts device node and perform an open on it -- I think in practice it's more conservative of code to do that in user space than kernel, as we can reuse the current open() code in its entirety, rather than replicating the kernel code to instantiate a file descriptor in the process, find and hook up the device node to it, etc. So I suppose the open question is the pty master node -- right now, we instantiate the node on-demand when /dev/ptmx is looked up, but as discussed, devfs has a significant weakness in this area. The revised code instantiates the node as a result of an ioct on a true /dev/ptmx node, and then the master can be opened using open() in the process. While the GC code isn't all that fun, it's not all that complicated, and reflected a very minor patch on both the user and kernel code. This suggests that a system call should, rather than creating both endpoints and hooking them up to file descriptors, instead create only a master file descriptor and give it no actual instantiation in devfs. > Another thing that would make the TTY code a little bit cleaner in my > opinion is removing the PRIV_TTY_PRISON check and making something generic > inside devfs. If we have proper garbage collecting on TTY's, then we can > just change make_dev_cred() to bind the new device node to a certain jail. > That way you could even choose to hide nodes in /dev that don't belong to > the jail in question. I'll have to think about this. BTW, while reading the pts/pty code again, I began to wonder if the comments about freeing tty's are still accurate... Robert N M Watson Computer Laboratory University of Cambridge