Date: Tue, 7 Nov 2000 08:08:19 -0800 (PST) From: Matt Dillon <dillon@earth.backplane.com> To: Bruce Evans <bde@zeta.org.au> Cc: Kirk McKusick <mckusick@mckusick.com>, arch@FreeBSD.ORG Subject: Re: softdep panic due to blocked malloc (with traceback) Message-ID: <200011071608.eA7G8Jb73998@earth.backplane.com> References: <Pine.BSF.4.21.0011072254370.3075-100000@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:... but I don't see how using malloc() in low-level i/o routines can be
:safe in general. Deadlock seems to be possible if completion of output
:is necessary to free some pages. Deadlock just usually doesn't occur,
:because the system attempts to reserve enough free pages to satisfy
:low-level memory allocations.
:
:Bruce
I have a complete solution to the low-memory deadlock problem under
test with Paul Saab, and DG has approved of the idea. As soon as both
Paul and My machines survive a night of extreme memory strain I'll
make the patches available generally.
This is how it works:
* The problem we face is that giving certain system processes
special allocation privileages DOES NOT WORK, because non-system
processes can still block on a low-memory issue while holding
a vnode and this will prevent system processes such as pageout
from being able to flush any pages associated with that vnode
whether they can allocate memory or not.
* We remove all contrived 'low memory' limitations from any code
which might be called with a locked vnode. Specifically the
buffer cache code.
- getblk() no longer blocks if it is low on buffers, only if it
is out of buffers.
- When the buffer cache codes allocates a page, it is allowed
to dig into the free memory reserve rather then block.
* All kernel MALLOCs called by filesystem support routines such
as ffs_inode.c and ffs_softdep.c use M_USE_RESERVE, allowing
the kernel malloc to dig into the memory reserve
* If we are low on memory, the following occurs:
- All major delayed write calls, bdwrite(), are turned into
async calls, bawrite().
- brelse() and bqrelse() free clean buffers and their underlying
VM pages (VM pages go into the CACHE instead of the INACTIVE
queue), recovering resources immediately.
This allows us to continue issuing I/O without limitation and
yet not run out of memory.
- The rest of the system uses the normal allocator flags and will
block in a low memory situation. But due to the new method of
doing things our paging I/O still operates to free up new memory.
The jist of the solution is that I/O is able to continue when you hit
v_free_reserved. While the rest of the system shudders, I/O still goes
on which means the system can recover.
That's it in a nutshell. The patches are modest but not complex...
actually fairly straightforward. I haven't dealt with networking/NFS
issues yet, but I believe I have the main filesystem and softupdates
working under extreme *dirty* mmap and I/O loads. I'll know when Paul
gets back to me on the overnight tests he ran. Note that Paul and I
have been testing things for a week with things failing within hours
usually, so it may not be today. Or it may...
-Matt
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200011071608.eA7G8Jb73998>
