Date: Tue, 11 Apr 2006 14:18:27 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: Kris Kennaway <kris@obsecurity.org> Cc: freebsd-stable@freebsd.org, Michael Schuh <michael.schuh@gmail.com> Subject: Re: Maximum Swapsize Message-ID: <200604112118.k3BLIRjF042154@apollo.backplane.com> References: <1dbad3150604100913hff9fc4dsb125ea541675f992@mail.gmail.com> <1dbad3150604110356m6ca0e92mee07fe59c8973b0f@mail.gmail.com> <20060411202713.GB89949@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
From 'man tuning' (I think I wrote this, a long time ago): You should typically size your swap space to approximately 2x main mem- ory. If you do not have a lot of RAM, though, you will generally want a lot more swap. It is not recommended that you configure any less than 256M of swap on a system and you should keep in mind future memory expan- sion when sizing the swap partition. The kernel's VM paging algorithms are tuned to perform best when there is at least 2x swap versus main mem- ory. Configuring too little swap can lead to inefficiencies in the VM page scanning code as well as create issues later on if you add more mem- ory to your machine. Finally, on larger systems with multiple SCSI disks (or multiple IDE disks operating on different controllers), we strongly recommend that you configure swap on each drive (up to four drives). The swap partitions on the drives should be approximately the same size. The kernel can handle arbitrary sizes but internal data structures scale to 4 times the largest swap partition. Keeping the swap partitions near the same size will allow the kernel to optimally stripe swap space across the N disks. Do not worry about overdoing it a little, swap space is the saving grace of UNIX and even if you do not normally use much swap, it can give you more time to recover from a runaway program before being forced to reboot. -- The last sentence is probably the most important. The primary reason why you want to configure a fairly large amount of swap has less to do with performance and more to do with giving the system admin a long runway to have the time to deal with unexpected situations before the machine blows itself to bits. The swap subsystem has the following limitation: /* * If we go beyond this, we get overflows in the radix * tree bitmap code. */ if (nblks > 0x40000000 / BLIST_META_RADIX / nswdev) { printf("exceeded maximum of %d blocks per swap unit\n", 0x40000000 / BLIST_META_RADIX / nswdev); VOP_CLOSE(vp, FREAD | FWRITE, td); return (ENXIO); } By default, BLIST_META_RADIX is 16 and nswdev is 4, so the maximum number of blocks *PER* swap device is 16 million. If PAGE_SIZE is 4K, the limitation is 64 GB per swap device and up to 4 swap devices (256 GB total swap). The kernel has to allocate memory to track the swap space. This memory is allocated and managed by kern/subr_blist.c (assuming you haven't changed things since I wrote it). This is basically implemented as a flattened radix tree using a fixed radix of 16. The memory overhead is fixed (based on the amount of swap configured) and comes to approximately 2 bits per VM page. Performance is approximately O(log N). Additionally, once pages are actually swapped out the VM object must record the swap index for each page. This costs around 4 bytes per swapped-out page and is probably the greatest limiting factor in the amount of swap you can actually use. 256GB of 100% used swap would eat 256MB of kernel ram. I believe that large linear chunks of reserved swap, such as used by MD, currently still require the per-page overhead. However, theoretically, since the reservation model uses a radix tree, it *IS* possible to reserve huge swaths of linear-addressed swap space with no per-page storage requirements in the VM object. It is even possible to do away with the 2 bits per page that the radix tree uses if the radix tree were allocated dynamically. I decided against doing that because I did not want the swap subsystem to be reliant on malloc() during critical low-memory paging situations. -Matt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200604112118.k3BLIRjF042154>