Date: Tue, 7 Nov 2000 08:08:19 -0800 (PST) From: Matt Dillon <dillon@earth.backplane.com> To: Bruce Evans <bde@zeta.org.au> Cc: Kirk McKusick <mckusick@mckusick.com>, arch@FreeBSD.ORG Subject: Re: softdep panic due to blocked malloc (with traceback) Message-ID: <200011071608.eA7G8Jb73998@earth.backplane.com> References: <Pine.BSF.4.21.0011072254370.3075-100000@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:... but I don't see how using malloc() in low-level i/o routines can be :safe in general. Deadlock seems to be possible if completion of output :is necessary to free some pages. Deadlock just usually doesn't occur, :because the system attempts to reserve enough free pages to satisfy :low-level memory allocations. : :Bruce I have a complete solution to the low-memory deadlock problem under test with Paul Saab, and DG has approved of the idea. As soon as both Paul and My machines survive a night of extreme memory strain I'll make the patches available generally. This is how it works: * The problem we face is that giving certain system processes special allocation privileages DOES NOT WORK, because non-system processes can still block on a low-memory issue while holding a vnode and this will prevent system processes such as pageout from being able to flush any pages associated with that vnode whether they can allocate memory or not. * We remove all contrived 'low memory' limitations from any code which might be called with a locked vnode. Specifically the buffer cache code. - getblk() no longer blocks if it is low on buffers, only if it is out of buffers. - When the buffer cache codes allocates a page, it is allowed to dig into the free memory reserve rather then block. * All kernel MALLOCs called by filesystem support routines such as ffs_inode.c and ffs_softdep.c use M_USE_RESERVE, allowing the kernel malloc to dig into the memory reserve * If we are low on memory, the following occurs: - All major delayed write calls, bdwrite(), are turned into async calls, bawrite(). - brelse() and bqrelse() free clean buffers and their underlying VM pages (VM pages go into the CACHE instead of the INACTIVE queue), recovering resources immediately. This allows us to continue issuing I/O without limitation and yet not run out of memory. - The rest of the system uses the normal allocator flags and will block in a low memory situation. But due to the new method of doing things our paging I/O still operates to free up new memory. The jist of the solution is that I/O is able to continue when you hit v_free_reserved. While the rest of the system shudders, I/O still goes on which means the system can recover. That's it in a nutshell. The patches are modest but not complex... actually fairly straightforward. I haven't dealt with networking/NFS issues yet, but I believe I have the main filesystem and softupdates working under extreme *dirty* mmap and I/O loads. I'll know when Paul gets back to me on the overnight tests he ran. Note that Paul and I have been testing things for a week with things failing within hours usually, so it may not be today. Or it may... -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200011071608.eA7G8Jb73998>