From owner-freebsd-bugs@FreeBSD.ORG Wed Jul 5 11:20:29 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AE9B016A4DA for ; Wed, 5 Jul 2006 11:20:29 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 508C443D46 for ; Wed, 5 Jul 2006 11:20:29 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k65BKTiN078971 for ; Wed, 5 Jul 2006 11:20:29 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k65BKTZd078970; Wed, 5 Jul 2006 11:20:29 GMT (envelope-from gnats) Date: Wed, 5 Jul 2006 11:20:29 GMT Message-Id: <200607051120.k65BKTZd078970@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Robert Watson Cc: Subject: Re: kern/87255: Large malloc-backed mfs crashes the system X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Robert Watson List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jul 2006 11:20:29 -0000 The following reply was made to PR kern/87255; it has been noted by GNATS. From: Robert Watson To: Yar Tikhiy Cc: freebsd-bugs@FreeBSD.org, bug-followup@FreeBSD.org Subject: Re: kern/87255: Large malloc-backed mfs crashes the system Date: Wed, 5 Jul 2006 12:16:11 +0100 (BST) On Wed, 26 Oct 2005, Yar Tikhiy wrote: > > In all cases it is a "don't do that then" class of problem. > > Yes, of course. The question is whether we consider it normal for root to > have ability to panic the system using standard tools. "cat /dev/zero > > /dev/mem" still is the ultimate way to. IMHO it is a key issue whether we > fall back at the academical/research stage where rough corners are OK and > the system is just a toy for eggheads, or we pretend our system is stable > and robust. I doubt if an admin can crash the Windows NT kernel from the > userland using conventional interfaces. I by no means expect this issue to > be resolved soon, but it's worth being reflected on at tea-time :-) > > Apropos, here's another reproducible crash induced by md: > > # mdconfig -a -t malloc -s 300m > md0 > # dd if=/dev/urandom of=/dev/md0 bs=1 > dd: /dev/md0: Input/output error > 79+0 records in > 78+9 records out > # reboot > panic: kmem_malloc(4096): kmem_map too small: 86224896 total allocated > > Apparently, it is not a fault of md, just our kernel memory allocator allows > other kernel parts to starve it to death. I'm not sure I entirely go along with this interpretation. The answer to the question "What do do when the kernel runs out of address space?" is not easily found. The "problem" is that md performs potentially unbounded allocation of a quite bounded resource -- remember that resource deadlocks are very real, sometimes it takes memory to release memory (abstractly, think of memory allocation as locking). UMA supports allocator-enforced resource limits, which can be requested by the consumer using uma_zone_set_max(). md(4) should probably be using that interface and requesting a resource limit. There is also a problem then regarding what happens when md(4) runs out of resources to allocate when it has already "promised" that it's a disk of a certain size up the stack. I.e., if the result isn't a panic, then how will md(4) handle failure? Most file systems will not be happy when they get EIO, so then perhaps the problem is that md(4) provides an abstraction for a non-sparse device up the storage stack, but is in fact over-committing. This suggests either that the size of an md device should be strictly bounded if it is malloc-backed. Picking that maximum bound is also tricky. This is why, in practice, we recommend using swap-backed md devices, so that the pages associated with the md device can be swapped out under memory pressure, and that the swap system have enough memory to fully back the md device. Robert N M Watson Computer Laboratory University of Cambridge