From owner-freebsd-hackers@FreeBSD.ORG Sun Dec 9 07:32:07 2007 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0CE1816A417 for ; Sun, 9 Dec 2007 07:32:07 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id DA15E13C458 for ; Sun, 9 Dec 2007 07:32:06 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 3DCA646F06; Sun, 9 Dec 2007 02:32:06 -0500 (EST) Date: Sun, 9 Dec 2007 07:32:06 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Sonja Milicic In-Reply-To: <4757F5E4.5030500@geri.cc.fer.hr> Message-ID: <20071209072147.V12952@fledge.watson.org> References: <4757F5E4.5030500@geri.cc.fer.hr> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-hackers@freebsd.org Subject: Re: Large array in KVM X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Dec 2007 07:32:07 -0000 On Thu, 6 Dec 2007, Sonja Milicic wrote: > I'm working on a kernel module that needs to maintain a large structure in > memory. As this structure could grow too big to be stored in memory, it > would be good to offload parts of it to the disk. What would be the best way > to do this? Could using a memory-mapped file help? Sonja, I think the answer depends a bit on just how large the data is. The two most critical limits are consumption of physical memory and consumption of address space. There are several parts of the kernel that deal with these sorts of scenarios for various reasons. You might take a look at the pipe code, which maps pageable buffers into kernel address space, and the md(4) code, which can provide swap-backed virtual disk storage. And, of couse, the file system is the quintissential kernel subsystem that brings data in and out of memory from disk :-). On 64-bit systems, address space limits won't be much of a concern in most scenarios, but on 32-bit systems, the kernel address space is quite small (512m/1g in most configurations), and as such is both significantly smaller than physical memory, and also potentially quite full on busy systems. On 32-bit systems, it is therefore critical to manage address space use and not just memory use, so it may not be possible to simply map and use large amounts of memory without careful planning. If you're talking about a relatively small amount of memory -- e.g., a few megabytes -- that you want to be pageable, the pipe code is a good reference. Remember that page faults may sleep for an extended period, so you would need to be able to avoid touching potentially paged out memory while holding mutexes, rwlocks, and critical sections, as well as from non-sleepable contexts such as interrupt threads. Using VM, you can explicitly manage the paging, or you can just make sure to touch the memory only in safe contexts, such as from the kernel portions of user threads when either no locks are held, or only sleepable locks (such as lockmgr, sx(9)). For larger amounts of memory, you will probably want to maintain your own cache of data loaded explicitly or mapped and faulted explicitly because of address space limits. You may find that you want to interact directly with the buffer cache/VM system, and might find that your code ends up looking a bit like a file system itself. So, in brief summary: consider both physical and address space limitations, and to what extent you'll need to manage the use to prevent exhaustion of either resouce. You also need to be careful with locks and contexts you might need to fault in data. File system code, pipe code, md code all useful reference material. Robert N M Watson Computer Laboratory University of Cambridge