Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Oct 1999 20:41:17 -0400
From:      Sergey Babkin <babkin@bellatlantic.net>
To:        Michael Beckmann <petzi@apfel.de>
Cc:        Matthew Dillon <dillon@apollo.backplane.com>, Julian Elischer <julian@whistle.com>, hackers@FreeBSD.ORG
Subject:   Re: Limitations in FreeBSD
Message-ID:  <3818ED2D.8DE2F050@bellatlantic.net>
References:  <199910282143.OAA10601@apollo.backplane.com> <Pine.BSF.4.10.9910281455250.11610-100000@current1.whistle.com> <19991029005450.A2757@apfel.de> <199910282234.PAA12655@apollo.backplane.com> <19991029011348.B2757@apfel.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Michael Beckmann wrote:
> 
> On Thu, Oct 28, 1999 at 03:34:53PM -0700, Matthew Dillon wrote:
> > :OK, so I know now that I can have pretty large files in the Terabyte range.
> > :Very nice. But I assume I cannot mmap anything like a 100 GB file ?
> > :
> > :Michael
> >
> >     Intel cpu's only have a 4G address space.  Your are limited to around
> >     a 2G mmap()ing.  You can mmap() any portion of a larger file but you
> >     cannot mmap() the whole file at once.
> >
> >     The easiest thing to do is to simply create a number of fixed-sized files
> >     and tell CNFS to use them.
> 
> Here is the problem:
> When you want to have 500 GB of storage, you will need 250 files. In the current
> implementation of nnrpd, this will need 250 file descriptors per nnrpd. This will

I think this situation will bring you where you started: you will
be able to map a whole one file, but only one file at a time,
not 250 files at once. So it would be simpler and more efficient 
to have only one big file and map the pieces of it as 
needed. I would also suggest to make these pieces much smaller than 
2G: then you will be able to map a number of them at once and when
you decide to substitute one mapped piece for another you won't
have to re-create the page tables for the whole 2G of the address
space. I guess the best size of these pieces depends on the
typical request size of your application.

> limit the number of readers that can be supported on a system, because a nnrpd is
> spawned for each reader. I was told that nnrpd can be hacked to only consume file
> descriptors when really needed, but itīs supposed to have a performance penalty.

Mapping and unmapping frequently the whole 2G files would impose much
higher performance penalty. Also if you plan to have many processes
doing mmaps don't forget that each of them would consume some
kernel address space for its page tables. With big mmaps you would
exhaust your kernel address space very quickly (an example of
this is that Oracle on UnixWare uses special calls using
4MB physical pages to map its SGA, otherwise the kernel virtual memory 
on a large-physical-memory machine gets exhausted very quickly and 
with terrible consequences). The page tables for such amounts
of memory would consume the kernel virtual memory much faster than 
250 file descriptors per process.

> Thatīs why Iīm looking for a way of having large mmapīable files. Are you saying
> that ALL Intel CPUs, including PIII, can only address 4 GB? I probably need to
> look at other architectures or solve this fd problem.

Pentiums can address more physical memory (again, UnixWare supports
up to 16GB of physical memory) - at the price of some performance
penalty, but not virtual memory. A 64-bit architecture such as
Alpha, UltraSPARC or HP PA-8000 may help you. But I guess
what you really need is to split your very big file into a number
of pages of some reasonable size and map/unmap them as needed.
Although the RISC architectures (at least PA-RISC and RS/6000 but
I think the others too) use different organisation of the page
tables and for them mapping the big memory areas in many processes
should not be a big problem.

-SB


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3818ED2D.8DE2F050>