From owner-freebsd-arch Wed Dec 13 2:26:43 2000 From owner-freebsd-arch@FreeBSD.ORG Wed Dec 13 02:26:38 2000 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 55E3B37B402; Wed, 13 Dec 2000 02:26:38 -0800 (PST) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id eBDAQcl15622; Wed, 13 Dec 2000 02:26:38 -0800 (PST) Date: Wed, 13 Dec 2000 02:26:38 -0800 From: Alfred Perlstein To: arch@freebsd.org Cc: net@freebsd.org Subject: patch to cleanup inflight desciptor handling. Message-ID: <20001213022637.A16205@fw.wintelcom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: bright@fw.wintelcom.net Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Not a lot of people are familiar with fd passing so I'll give a short description: By using AF_UNIX sockets between processes, a process can use sendmsg() to send a filedescriptor through the socket where the other process will do a recvmsg() to pickup the descriptor. The "problem" is that if a descriptor is in transit/inflight and the sending process closes the file, it still needs to remain open for the recipient. What can happen is: process A: sendmsg(descriptor) process A: exit process B: exit without the garbage collection we'd have leaked a file descriptor inside the kernel. There's a pretty complex loop in sys/kern/uipc_usrreq.c that deals with garbage collecting these inflight descriptors. The problem with the garbage collection routine is that: 1) it's expensive as it walks all the open files in the system at least twice. 2) it's ugly/hackish 3) it will need to aquire global locks on kernel structure lists for signifigant amounts of time. 4) complicates the code because certain things need to be done out of order, ie sorflush before sofree (which does the sorflush anyway). The solution is actually taken from Linux, in Linux all network buffers have the ability to have a free routine callback done on them when a network buffer is deallocated. FreeBSD only has a free routine available for M_EXT buffers (buffers with external storage), the routine is called when (m_flags & M_EXT) != 0 && m_type != EXT_CLUSTER To achieve my goal I made it so that all fd passing requires an mbuf cluster and took responsibility for freeing the mbuf cluster in my callback. I set m_type == EXT_CMSG_DATA and provide my own free routine until the descriptors are read by the recieving process, if the descriptors are read then i restore it back to a "normal" mbuf with an attached cluster to be free()'d. Good things about this patch: 1) simplifies a) locking b) descriptor management c) the code in general 2) less latency, the gc routine can be expensive 3) some comments are added describing some other stuff that needs fixing. (problems with rfork threads) 4) shrink struct file by one int Problems with this patch: 1) most fd passing probably only sends one descriptor at time, by allocating clusters I'm wasting a lot more space, and taking more time to do the allocation. 2) the mbuf subsystem should provide macros to do what I'm doing (hijacking the free routine on a mbuf+cluster) 3) the mbuf subsystem should provide a way to get a callback on a single mbuf without a cluster attached. http://people.FreeBSD.org/~alfred/inflight.diff thanks, -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message