From owner-freebsd-hackers@FreeBSD.ORG Sat Dec 12 19:50:53 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AF5E91065697 for ; Sat, 12 Dec 2009 19:50:53 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-px0-f182.google.com (mail-px0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id 7895C8FC15 for ; Sat, 12 Dec 2009 19:50:53 +0000 (UTC) Received: by pxi12 with SMTP id 12so517831pxi.3 for ; Sat, 12 Dec 2009 11:50:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=1cV78V0iUzvhTikSn/70vjruAB+hySEepUSP0WjjERw=; b=Gco3PaCK5H0anUgXXLMfeOzuZzZTbK6Ja+bIeMnBZEXjNQ5P5VKKvSGZXlUbVJ+6NB 5FEQbIGkQ90lWXXllBaIqX3XGHh2NGBLwaZe+f9qXWZwMKUFJdc9CwFYzYlFOgPnyJji B6nvdeYGc+0Vnn/2b2kVmpZjxJAZFTtJk9DIQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=B78RmGb52JZKl0pEJDJOJF/jlXFuCUWFJs/A+SEpLdeO9rFqDqmf4w1vdxmFXqe/Hy BizdBuCWdiPNZTass/wD5EWF6wDkfG5YfOnD2EcRFqWm1BQPxJn39orR/xvgRM0BiCLt 4YFbYxdlcuuSZqD48xC1MoyuSNNY681cjVayM= MIME-Version: 1.0 Received: by 10.143.128.2 with SMTP id f2mr1807701wfn.295.1260647452798; Sat, 12 Dec 2009 11:50:52 -0800 (PST) In-Reply-To: <20091210145052.GX20668@cicely7.cicely.de> References: <237c27100911260714x2fcb194ew1e6ce11e764efd08@mail.gmail.com> <200912090907.33433.jhb@freebsd.org> <20091210145052.GX20668@cicely7.cicely.de> Date: Sat, 12 Dec 2009 13:50:52 -0600 Message-ID: From: Alan Cox To: ticso@cicely.de Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-hackers@freebsd.org, Linda Messerschmidt , alc@cs.rice.edu Subject: Re: Superpages on amd64 FreeBSD 7.2-STABLE X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Dec 2009 19:50:53 -0000 On Thu, Dec 10, 2009 at 8:50 AM, Bernd Walter wrote: > On Wed, Dec 09, 2009 at 09:07:33AM -0500, John Baldwin wrote: > > On Thursday 26 November 2009 10:14:20 am Linda Messerschmidt wrote: > > > It's not clear to me if this might be a problem with the superpages > > > implementation, or if squid does something particularly horrible to > > > its memory when it forks to cause this, but I wanted to ask about it > > > on the list in case somebody who understands it better might know > > > whats going on. :-) > > > > I talked with Alan Cox some about this off-list and there is a case that > can > > cause this behavior if the parent squid process takes write faults on a > > superpage before the child process has called exec() then it can result > in > > superpages being fragmented and never reassembled. Using vfork() should > > prevent this from happening. It is a known issue, but it will probably > be > > some time before it is addressed. There is lower hanging fruit in other > areas > > in the VM that will probably be worked on first. > > For me the whole threads puzzles me. > Especially because vfork is often called a solution. > > Scenario A > Parent with super page > fork/exec > This problem can happen because there is a race. > The parent now has it's super pages fragmented permanently!? > the child throws away his pages because of the exec!? > > Scenario B > Parent with super page > vfork/exec > This problem won't happen because the child has no pseudo copy of the > parents memory and then starts with a completely new map. > > Scenario C > Parent with super page > fork/ no exec > The problem can happen because the child shares the same memory over > it's complete lifetime. > The parent can get it's super pages fragmented over time. > > I'm not sure how you are defining "problem". If we define "problem" as I would, i.e., that "re-promotion can never occur", then Scenario C is not a problem scenario, only Scenario A is. The source of the problem in Scenario A is basically that we have two ways of handling copy-on-write faults. Before the exec() occurs, copy-on-write faults are handled as you might intuit from the name, a new physical copy is made. If the entirety of the 2MB region is written to before the exec(), then this region will be promoted to a superpage. However, once the exec() occurs, copy-on-write faults are "optimized". Specifically, the kernel recognizes that the underlying physical page is no longer shared with the child and simply restores write access to it. It is the combination of these two methods that effectively blocks re-promotion because the underlying 4KB physical pages within a 2MB region are no longer contiguous. In other words, once the first page within a region has been copied, you have a choice to make: Do you perform avoidable copies or do you abandon the possibility of ever creating a superpage. The former has a significant one-time cost and the latter has a small recurring cost. Not knowing how much the latter will add up to, I chose the former. However, that choice may change in time, particularly, if I find an effective heuristic for choosing between the two options. Anyway, please keep trying superpages with large memory applications like this. Reports like this help me to prioritize my efforts. Regards, Alan