From owner-freebsd-fs@FreeBSD.ORG Tue Nov 24 18:18:25 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 195611065694 for ; Tue, 24 Nov 2009 18:18:25 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id CA2098FC2B for ; Tue, 24 Nov 2009 18:18:24 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 612B046B2C; Tue, 24 Nov 2009 13:18:24 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 768D28A01D; Tue, 24 Nov 2009 13:18:23 -0500 (EST) From: John Baldwin To: Matt Reimer Date: Tue, 24 Nov 2009 11:43:24 -0500 User-Agent: KMail/1.12.1 (FreeBSD/7.2-CBSD-20091103; KDE/4.3.1; amd64; ; ) References: <200911231018.40815.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200911241143.24034.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Tue, 24 Nov 2009 13:18:23 -0500 (EST) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-fs@freebsd.org Subject: Re: Current gptzfsboot limitations X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Nov 2009 18:18:25 -0000 On Monday 23 November 2009 5:04:30 pm Matt Reimer wrote: > On Mon, Nov 23, 2009 at 7:18 AM, John Baldwin wrote: > > On Friday 20 November 2009 7:46:54 pm Matt Reimer wrote: > >> I've been analyzing gptzfsboot to see what its limitations are. I > >> think it should now work fine for a healthy pool with any number of > >> disks, with any type of vdev, whether single disk, stripe, mirror, > >> raidz or raidz2. > >> > >> But there are currently several limitations (likely in loader.zfs > >> too), mostly due to the limited amount of memory available (< 640KB) > >> and the simple memory allocators used (a simple malloc() and > >> zfs_alloc_temp()). > ... > >> > >> I think I've also hit a stack overflow a couple of times while debugging. > >> > >> I don't know enough about the gptzfsboot/loader.zfs environment to > >> know whether the heap size could be easily enlarged, or whether there > >> is room for a real malloc() with free(). loader(8) seems to use the > >> malloc() in libstand. Can anyone shed some light on the memory > >> limitations and possible solutions? > >> > >> I won't be able to spend much more time on this, but I wanted to pass > >> on what I've learned in case someone else has the time and boot fu to > >> take it the next step. > > > > One issue is that disk transfers need to happen in the lower 1MB due to BIOS > > limitations. The loader uses a bounce buffer (in biosdisk.c in libi386) to > > make this work ok. The loader uses memory > 1MB for malloc(). You could > > probably change zfsboot to do that as well if not already. Just note that > > drvread() has to bounce buffer requests in that case. The text + data + bss > > + stack is all in the lower 640k and there's not much you can do about that. > > The stack grows down from 640k, and the boot program text + data starts at > > 64k with the bss following. > > Ah, the stack growing down from 640k explains a problem I was seeing > where a memcpy() to a temp buf would restart gptzfsboot--it must have > been overwriting the stack. > > > Hmm, drvread() might already be bounce buffering > > since boot2 has to do so since it copies the loader up to memory > 1MB as > > well. > > Looks like it's already bounce buffering. All the I/O drvread does is > to statically allocated char arrays, and the data is copied when > necessary, e.g. in vdev_read(): > > if (drvread(dsk, dmadat->rdbuf, lba, nb)) > return -1; > memcpy(p, dmadat->rdbuf, nb * DEV_BSIZE); > > > > You might need to use memory > 2MB for zfsboot's malloc() so that the > > loader can be copied up to 1MB. It looks like you could patch malloc() in > > zfsboot.c to use 4*1024*1024 as heap_next and maybe 64*1024*1024 as heap_end > > (this assumes all machines that boot ZFS have at least 64MB of RAM, which is > > probably safe). > > So are the page tables etc. already configured such that RAM above 1MB > is ready to use in gptzfsboot? (I'm not familiar with the details of > how virtual memory is handled on i386.) > > Thanks for your help John. Paging is not enabled in the boot loader. Instead, the loader runs in a 32-bit flat mode (but with an offset of 0xa000). Simply changing the constants for heap_start and heap_end should be sufficient. -- John Baldwin