From owner-freebsd-current@FreeBSD.ORG Fri Oct 8 07:46:25 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E34BE1065673; Fri, 8 Oct 2010 07:46:24 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh6.mail.rice.edu (mh6.mail.rice.edu [128.42.201.4]) by mx1.freebsd.org (Postfix) with ESMTP id ACB748FC0C; Fri, 8 Oct 2010 07:46:24 +0000 (UTC) Received: from mh6.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh6.mail.rice.edu (Postfix) with ESMTP id 1D2A728F768; Fri, 8 Oct 2010 02:46:23 -0500 (CDT) X-Virus-Scanned: by amavis-2.6.4 at mh6.mail.rice.edu, auth channel Received: from mh6.mail.rice.edu ([127.0.0.1]) by mh6.mail.rice.edu (mh6.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id VAvAnF49UAPg; Fri, 8 Oct 2010 02:46:22 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh6.mail.rice.edu (Postfix) with ESMTPSA id 7962528F743; Fri, 8 Oct 2010 02:46:22 -0500 (CDT) Message-ID: <4CAECC4D.90707@rice.edu> Date: Fri, 08 Oct 2010 02:46:21 -0500 From: Alan Cox User-Agent: Thunderbird 2.0.0.24 (X11/20100725) MIME-Version: 1.0 To: Andriy Gapon References: <4CA0DA49.2090006@freebsd.org> <4CA3A48A.5070300@freebsd.org> <4CA3BD1E.5070807@rice.edu> <4CA5911E.3000101@freebsd.org> <4CAE0060.7050607@freebsd.org> In-Reply-To: <4CAE0060.7050607@freebsd.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Fri, 08 Oct 2010 11:03:11 +0000 Cc: Alan Cox , Garrett Cooper , freebsd-current@freebsd.org Subject: Re: minidump size on amd64 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Oct 2010 07:46:25 -0000 Andriy Gapon wrote: > on 01/10/2010 10:43 Andriy Gapon said the following: > >> The idea. We dump contiguously only pages with PDEs (which means both valid and >> invalid PDEs), valid pages with PTEs are dumped the same way as data physical >> pages (i.e. via dump_add_page, etc); no fake PTEs for 2MB pages. >> PDE area of the dump takes about 20MB as opposed to 1GB for PTE area (the math >> is obvious, but just in case). >> >> libkva is changed to treat former PTE area as PDE area and is also taught to >> understand PG_PS in PDE. >> There is now an overhead of having to first read a PTE page in V-to-P-to-offset >> lookup for !PG_PS case. Perhaps we could cache all PTEs in memory and have a >> lookup table for them, but I didn't bother with this possibly premature >> optimization at this time. >> >> There is an unrelated change in minidumpsys - "bitmap_frozen". >> I had to do it despite having a patch in my local tree to stop other CPUs on >> panic->dump. Code in dump path (peripheral disk driver, CAM, SIM driver, >> something else?) seems to do some memory allocations and change dump bitmap, >> which leads to a mismatch between dump size and dump bitmap; and also >> potentially to inconsistencies in the bitmap itself. So I decided that it's a >> good idea to freeze the bitmap once we decided what pages we want to dump. >> >> Some variables and structure fields with 'pte' in them should probably be >> renamed to have 'pde' instead. >> > > Here's an updated patch: > http://people.freebsd.org/~avg/amd64-minidump.3.diff > > I went ahead and changed 'pte' to 'pde' in various names. > Also, I ditched somewhat questionable "bitmap_frozen" approach and instead opted > for restarting a dump on a size mismatch. This was suggested by kib@. > Garret Cooper has pointed out some problems with bitmap_frozen approach. > I think that actual problem was a scenario where a dump is done, then the system > is allowed to continue and then another dump is done. An exotic case perhaps? :-) > > One probably desirable feature that is missing is backward compatibility in > libkvm. If that is a showstopper, then I'll have to work on preserving it. > > As usual, I will appreciate any feedback - reviews, testing, etc. > The kernel part of the patch looks good. That said, I have one suggestion. The current generation of AMD and Intel processors has support for 1GB pages. If you want to make sure that this change will last us a long time, I would suggest translating the old trick of generating a fake page table page for 2MB pages into generating a fake page directory page for 1GB pages, rather than disposing of this code. Alan