From owner-freebsd-current@FreeBSD.ORG Mon Dec 3 23:37:59 2007 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 06A9616A419; Mon, 3 Dec 2007 23:37:59 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id D181A13C455; Mon, 3 Dec 2007 23:37:58 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id D531B47110; Mon, 3 Dec 2007 18:42:41 -0500 (EST) Date: Mon, 3 Dec 2007 23:37:50 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: current@FreeBSD.org Message-ID: <20071203225800.S30376@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: stable@FreeBSD.org Subject: Attention 7.x and 8.x ptmx/pts users (read if you set kern.pts.enable=1) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2007 23:37:59 -0000 (If you aren't interested in the details of our ptmx/pty/pts driver, skip to the paragraph that reads "So, why the long-winded story?) Dear all: The current ptmx/pts implementation makes use of devfs(4) cloning: a user process wanting to allocate a pty/pts pair opens /dev/ptmx, which returns a reference to a new pty master. An ioctl is then performed to query which pts number was returned, and the pts device is then opened. Internally, the lookup of /dev/ptmx causes the driver to instantiate the pty, and then when the pty is opened, the pts is created. The pty and pts nodes are both destroyed when last close occurs, cleaning up the bits automatically when the last process attached to thee pair exits. Sounds good. :-) Unfortunately, the current implementation is subject to a potential resource leak: the pty is created when the lookup occurs, but if the open never takes place, then the pty is leaked. In principle, we have facilities to GC unused device nodes "eventually", although not a race-free way to determine that no race occurs, assuming that we implemented that. This leakage turns out to interact particularly poorly with our resource limits on pty/pts pairs -- both the administrative limit imposed by sysctl and also the functional limit on the number of entries in /etc/ttys. It's possible to imagine various sometimes messy techniques of performing this garbage collection. Instead, what I'd like to do is modify the ptmx code to have a race-free protocol, in which eventual termination of processes referencing the node results in freeing of the nodes. On some systems, ptmx performs a "bait-and-switch", in which the file descriptor of the pty node is silently substituted for the file descriptor of the ptmx code--similar to our model, only no window between lookup and open, but also not easily supported in our current VFS. Another possibility is to introduce a new system call and bypass ptmx entirely -- similar to pipe(), socketpair(), etc. The change that seemed to be the least disruptive, and which I have implemented, introduces ptmx as a true device node (not a devfs clone), and an ioctl that causes the allocation of the pty and pts pair -- however, the pair is also added to a garbage collection list. If the ptmx node is closed *before* the pty is opened, then the nodes are garbage collected. It turns out this also isn't easily implementable in our VFS, as we don't offer a per-file descriptor opaque to be used by device driver, nor offer the file descriptor pointer to the device driver (as in, say, Linux). At some point, this functionality will turn up, as there has been consistent interest in it over time. What I've done is implement an approximation of that model -- an "open counter" for ptmx, which when it hits zero across all references, causes a garbage collection sweep. If/when we can use per-file descriptor state, it is easily modified to sweep on close of a specific descriptor. --> start reading here if you were bored by the above Why the long-winded story? Well, this turns out to change the convention by which libc communications with the kernel -- instead of a simple open of ptmx and then ioctl to find the pts, we now open ptmx, perform an ioctl to allocate the pair, and then open both the pty and pts nodes explicitly. Thus, libc requires modification, and libcs that know how to speak to the old ptmx don't know how to speak to the new one, and, in effect, vice versa. This doesn't meet our ABI requirements for a stable branch, so what I plan to do is withdraw the ptmx/pts implementation from 7.0 before the release by disabling it in the kernel and libc. This will prevent us from nailing down the ABI, and we'll instead merge the revised protocol for 7.1. This change will, however, affect users of the 8-CURRENT branch, as during an upgrade cycle, it's likely that libc and kernel will be out of sync, and therefore if pts support is enabled (via the kern.pts.enable sysctl), pty devices will not be available, which might crimp the style of anyone performing a remote upgrade via, say, ssh. So, this is notice of two upcoming changes: (1) kern.pts.enable will be removed in 7.x, for reintroduction in 7.1. If kern.pts.enable was set, then your system will silently revert to using old-style ptys, and the setting of the sysctl will lead to an error. (2) I will merge the revised ptmx implementation to 7.x, potentially disrupting use of pty/pts devices for users who have kern.pts.enable explicitly set to a non-zero value. Hopefully this will resolve the known resource leaks in the ptmx code, and get us on track to start enabling it by default in the near future ... in 8.x, and at least offering it as a production feature in 7.x. Thanks, Robert N M Watson Computer Laboratory University of Cambridge