Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 30 Sep 2001 03:55:29 -0500
From:      Alfred Perlstein <bright@mu.org>
To:        Vladimir Dozen <vladimir-dozen@mail.ru>
Cc:        Matt Dillon <dillon@earth.backplane.com>, Wilko Bulte <wkb@freebie.xs4all.nl>, hackers@FreeBSD.ORG
Subject:   Re: VM: dynamic swap remapping (patch)
Message-ID:  <20010930035529.G59854@elvis.mu.org>
In-Reply-To: <20010930120328.A534@eix.do-labs.spb.ru>; from vladimir-dozen@mail.ru on Sun, Sep 30, 2001 at 12:03:28PM %2B0000
References:  <20010929155941.A291@eix.do-labs.spb.ru> <20010929071024.Q59854@elvis.mu.org> <20010929141349.A80876@freebie.xs4all.nl> <200109291653.f8TGrRR37689@earth.backplane.com> <20010929232953.B341@eix.do-labs.spb.ru> <20010929175653.Z59854@elvis.mu.org> <20010930120328.A534@eix.do-labs.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
* Vladimir Dozen <vladimir-dozen@mail.ru> [010930 03:02] wrote:
> ehlo.
> 
> > My suggestion, (but not my final say, i'm still open to ideas):
> > 
> >    Implement a memory status signal to notify processes of changes
> >    in the relative amount of system memory.
> > 
> >    When memory reaches a low or high watermark, the signal is
> >    broadcast to all running processes.
> > 
> >    The default disposition will be to ignore the signal.
> > 
> >    The signal will be named SIGMEMINFO.  (SIGXfoo means
> >    'process has exceeded resource foo')
> 
>   Agreed. As for SIG_IGN, can anyone tell me -- can I force
>   existing application to use my signal handler? For example,
>   by preallocating some shared library? If so, there are no
>   contras for ignoring signal by default.

Yes, it's kind of evil, but you need to do this:

make a .c file that has your signal handler and a function
called _init that enables it.  You also might want to 
export if we're in the low watermark via sysctl variable
so that at startup you can set a variable to do things 
like make a malloc wrapper fail...

compile it like so:
gcc -shared -fpic -fPIC -o t2.So -c t2.c ; ld t2.So -shared -o t2.so

then install it someplace, then set this in the environment:

LD_PRELOAD=/path/to/where/you/put/it/t2.so

All non-setuid and non-setgid programs will respect this.

Now just because I told you that, doesn't mean you should run off
now and use your solution, I hope you take the time to consider
what I've got to say here. :)

> >    The signal will pass via the siginfo struct information
> >    such that the process can determine if the system has
> >    just exceeded the low watermark (danger) or has reclaimed
> >    down to the high watermark (enough free memory).
> 
>   Passing more info is always better. Agreed.

It's just a trick so you only need one signal instead of a signal
for both SIGMEMLOW (signal memory low) and SIGMEMOK (signal memory
is ok, or enough is free).

> > a) over allocate swap a bit and set the low watermark carefully.
> > b) do the following enhancement:
> > 
> >      Provide a system whereby you can swap to the filesystem without
> >      additional upcalls/syscalls from userspace, basically, provide
> >      some means of paging to the filesystem automatically.
> > 
> >   then, set your lowwater mark to the size of your swap partition,
> >   now your system will alert your processes and automatically swap
> >   _anyone_ to the filesystem.
> > 
> > I really think that this would be more flexible and still allow
> > you to achieve what you want... What do you think?
> 
>   I can't say anything until I'll got detail. Sorry, English is neither
>   my native nor used often, so I may easely miss important details, but
>   here is my random comments:
>   
>   Initally, I was trying the same (I think) approach, but there was 
>   some problems. Some kernel function refused to work with VM objects 
>   of processes differing from curproc. I.e., it could be hard to work 
>   with bigproc inside swap daemon; and swap daemon is the only place 
>   where we can detect OOM condition; that's why I used signal to transfer
>   control to user space, and then back into kernel -- already in another
>   process. Another reason to do it -- to make all limits and quota work
>   automatically. Also, I did not wanted to make swap daemon busy too long.

Well you can simply grab curproc to do this and steal the context,
most likely you'll be in the context of a process that has faulted,
the only trick you need to do is to write the file instead of directly
to the device.  The idea is to cause the next fault by any program
to give you a 'curproc' to do filesystem operations.

You could also wakeup() the swapper and set a flag to tell it to
allocate some filesystem space, the problem (as you've stated) is
that you can tie swapper up too long doing this.

>   Also, what means "over allocate swap a bit"? How to compute the value
>   of that bit? At what moment should we preallocate? Should we repeat
>   preallocation after getting SIGMEMINFO (himark)?

You're still thinking of the combined solution, just think of a
system where all you have right now is the signals I mentioned.

Now remeber, your solution depends on spare space in the filesystem...

Your spare space is most likely not known.

Instead of depending on possibly non-existant spare space, just
make your swap a little bigger and set your low watermark a bit
lower.  Now you have more time to do something when memory is low.

>   Also, you cannot set low mark to size of swap partition. To create
>   file-based swap you need some memory (file operations requires it).
>   So, low mark should be a bit lower (that's why I raised value of
>   nswap_lowat).

Yes, you're right, I didn't consider this.  You can start swapping
to the filesystem at an earlier point then.

>   Finally, if you want to over allocate swap for every process in
>   system, the whole swap can wind up consisting of only preallocations.
>   Resource management is the role of kernel. Any hard reservation
>   interfere with that.

I'm not argueing for pre-allocation, I'm just saying that at a
certain point you're going to run out of swap.  If that swap is in
your filesystem you can still run out of filesystem space (if you
even have any at that point).  That's why you over allocate and
set your low watermark appropriately.

So instead of having 1gig of swap, and depending on having N blocks
free in the filesystem, you allocate 1gig+N blocks of swap and
watch for the signal to start freeing resources.

Just think what happens if your filesystems are full and you run
out of swap...

-- 
-Alfred Perlstein [alfred@freebsd.org]
'Instead of asking why a piece of software is using "1970s technology,"
start asking why software is ignoring 30 years of accumulated wisdom.'

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010930035529.G59854>