From owner-freebsd-bugs@FreeBSD.ORG Thu Jul 20 14:52:39 2006 Return-Path: X-Original-To: freebsd-bugs@FreeBSD.org Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E89E516A4DA; Thu, 20 Jul 2006 14:52:38 +0000 (UTC) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (comp.chem.msu.su [158.250.32.97]) by mx1.FreeBSD.org (Postfix) with ESMTP id DC3F643D46; Thu, 20 Jul 2006 14:52:36 +0000 (GMT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (localhost [127.0.0.1]) by comp.chem.msu.su (8.13.4/8.13.3) with ESMTP id k6KEqZon098148; Thu, 20 Jul 2006 18:52:35 +0400 (MSD) (envelope-from yar@comp.chem.msu.su) Received: (from yar@localhost) by comp.chem.msu.su (8.13.4/8.13.3/Submit) id k6KEqYZl098147; Thu, 20 Jul 2006 18:52:34 +0400 (MSD) (envelope-from yar) Date: Thu, 20 Jul 2006 18:52:34 +0400 From: Yar Tikhiy To: Robert Watson Message-ID: <20060720145234.GE95131@comp.chem.msu.su> References: <200510260910.j9Q9AKtg075166@freefall.freebsd.org> <20060705120908.Q18236@fledge.watson.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060705120908.Q18236@fledge.watson.org> User-Agent: Mutt/1.5.9i Cc: freebsd-bugs@FreeBSD.org, bug-followup@FreeBSD.org Subject: Re: kern/87255: Large malloc-backed mfs crashes the system X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Jul 2006 14:52:39 -0000 On Wed, Jul 05, 2006 at 12:16:11PM +0100, Robert Watson wrote: > On Wed, 26 Oct 2005, Yar Tikhiy wrote: > > >> In all cases it is a "don't do that then" class of problem. > > > >Yes, of course. The question is whether we consider it normal for root to > >have ability to panic the system using standard tools. "cat /dev/zero > > >/dev/mem" still is the ultimate way to. IMHO it is a key issue whether we > >fall back at the academical/research stage where rough corners are OK and > >the system is just a toy for eggheads, or we pretend our system is stable > >and robust. I doubt if an admin can crash the Windows NT kernel from the > >userland using conventional interfaces. I by no means expect this issue > >to be resolved soon, but it's worth being reflected on at tea-time :-) > > > >Apropos, here's another reproducible crash induced by md: > > > > # mdconfig -a -t malloc -s 300m > > md0 > > # dd if=/dev/urandom of=/dev/md0 bs=1 > > dd: /dev/md0: Input/output error > > 79+0 records in > > 78+9 records out > > # reboot > > panic: kmem_malloc(4096): kmem_map too small: 86224896 total > > allocated > > > >Apparently, it is not a fault of md, just our kernel memory allocator > >allows other kernel parts to starve it to death. > > I'm not sure I entirely go along with this interpretation. The answer to > the question "What do do when the kernel runs out of address space?" is not > easily found. The "problem" is that md performs potentially unbounded > allocation of a quite bounded resource -- remember that resource deadlocks > are very real, sometimes it takes memory to release memory (abstractly, > think of memory allocation as locking). UMA supports allocator-enforced > resource limits, which can be requested by the consumer using > uma_zone_set_max(). md(4) should probably be using that interface and > requesting a resource limit. The panic doesn't seem to be on a critical path in the kernel; it's in kmem_malloc(), which is essentially a utility routine. Could the allocation attempt just fail for the caller to decide what to do then? In fact, it can fail, but only in case of M_NOWAIT: if (vm_map_findspace(map, vm_map_min(map), size, &addr)) { vm_map_unlock(map); if ((flags & M_NOWAIT) == 0) panic("kmem_malloc(%ld): kmem_map too small: %ld total allocated", (long)size, (long)map->size); return (0); } Looks like we have to panic there merely because malloc(9) is promised to succeed if waiting is OK, but there's no chance for success. Isn't it a design issue? > There is also a problem then regarding what happens when md(4) runs out of > resources to allocate when it has already "promised" that it's a disk of a > certain size up the stack. I.e., if the result isn't a panic, then how > will md(4) handle failure? Most file systems will not be happy when they > get EIO, so then perhaps the problem is that md(4) provides an abstraction > for a non-sparse device up the storage stack, but is in fact > over-committing. This suggests either that the size of an md device should > be strictly bounded if it is malloc-backed. Picking that maximum bound is > also tricky. This is why, in practice, we recommend using swap-backed md > devices, so that the pages associated with the md device can be swapped out > under memory pressure, and that the swap system have enough memory to fully > back the md device. Perhaps md(4) shouldn't over-commit in malloc mode? It will waste precious physical memory, but malloc mode is supposed to. And one can't use swap-backed md when diskless. -- Yar