From owner-freebsd-questions@FreeBSD.ORG Fri Jan 7 18:16:48 2005 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5F3A516A4CE for ; Fri, 7 Jan 2005 18:16:48 +0000 (GMT) Received: from stewie.obfuscated.net (stewie.obfuscated.net [66.118.188.125]) by mx1.FreeBSD.org (Postfix) with ESMTP id D0B6B43D3F for ; Fri, 7 Jan 2005 18:16:47 +0000 (GMT) (envelope-from m@obmail.net) Received: from [192.168.1.103] (653259hfc120.tampabay.rr.com [65.32.59.120]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by stewie.obfuscated.net (Postfix) with ESMTP id E53F36104; Fri, 7 Jan 2005 13:16:46 -0500 (EST) In-Reply-To: <20050107173333.GA865@procyon.nekulturny.org> References: <41DDB2A7.8020001@wilderness.dyn.dhs.org> <41DE0F6F.3040303@taborandtashell.net> <1105100701.640.6.camel@chaucer> <20050107173333.GA865@procyon.nekulturny.org> Mime-Version: 1.0 (Apple Message framework v619) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <2EBCB4AD-60D8-11D9-B88F-00039367611E@obmail.net> Content-Transfer-Encoding: 7bit From: M Date: Fri, 7 Jan 2005 13:15:59 -0500 To: Danny MacMillan X-Mailer: Apple Mail (2.619) cc: Mike Jeays cc: Laurence Sanford cc: tkelly-freebsd-questions@taborandtashell.net cc: FreeBSD Mailing List Subject: Re: Remote upgrade possible? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Jan 2005 18:16:48 -0000 On Jan 7, 2005, at 12:33 PM, Danny MacMillan wrote: > I haven't looked at the code, but your assertion is extremely unlikely. > I really want to say "impossible" but as I said, I haven't looked at > the code. If FreeBSD loaded entire executable images into RAM when > starting new processes, it would perform very poorly. What is more > likely is that the kernel keeps the image file open during program > execution. When the xterm binary is replaced, the old binary is still > on disk in its old location, it just doesn't have any directory > entries pointing to it. Since the kernel still has the file open it > won't be overwritten. Hence the kernel can and will still load > pages from the old image. This is a function of the same behaviour > that causes df and du output to differ in some cases. > > The lsof(8) utility seems to bear this out, as each process seems to > keep each image (program and shared object files) open during > execution. > > A new instance of xterm would use the new, upgraded binary. > When you run a program the program that runs the new one makes a copy of itself in the process table and they share code pages. This is done through fork(). At that point the new process, called the child, calls one of the exec() function calls which in turn calls a single syscall, execve(). execve() uses namei() to get the vnode pointer. Each vnode pointer has three ference counts, v_usecount, v_holdcnt and v_writecount. A vnode is not recycled until both the usecount and holdcnt are 0. When namei() is called it calls VREF() which is vref() which does vp->v_usecount++; so if it's running the page can't be recycled from a point in time before the program actually is loaded in to memory. execve() calls exec_map_first_page(). Without tearing this apart I'm going to guess that this memory maps the first page of text (code) through the VM subsystem as evidenced by the conspicuous calls to vm_page*() functions so I'd conclude the file is memory mapped. Presuming it turns out the command you're calling isn't a shell script or other script execve() cleans up the environment so file descriptors and signal handlers don't get shared, the processes environment is setup, lets the calling (forking) process know it can continue on it's merry way, sets uid/gid if necessary/possible, and it looks like the scheduler takes care of the rest (I'll be honest here, the code seems to trail off here so far as I can tell in to parts that are jumped to in case of error). In any case we have a increased usecount. Now we are going to unlink that file and create a new one. After some basic checks (you can't remove the root of a file system for example) unlink() will call VOP_REMOVE() which calls vrele() which deincrements the usecount when it's greater than one, which in this case it MUST be because the xterm process has one count on it and the file entry has another (hard links to the file may have additional counts on it). Therefore it appears that you can unlink the file, it will remain on the disk to serve the memory mapped image used for the running process and install a new copy. I'm going to presume when a process exits it de-increments the usecount for the vnode, which, when 0 should put the page on the free list.