From owner-freebsd-bugs@FreeBSD.ORG  Wed Jul  5 11:16:12 2006
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
X-Original-To: freebsd-bugs@FreeBSD.org
Delivered-To: freebsd-bugs@FreeBSD.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3D83816A4DD;
	Wed,  5 Jul 2006 11:16:12 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DDE8643D45;
	Wed,  5 Jul 2006 11:16:11 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 7F36B46C23;
	Wed,  5 Jul 2006 07:16:11 -0400 (EDT)
Date: Wed, 5 Jul 2006 12:16:11 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Yar Tikhiy <yar@comp.chem.msu.su>
In-Reply-To: <200510260910.j9Q9AKtg075166@freefall.freebsd.org>
Message-ID: <20060705120908.Q18236@fledge.watson.org>
References: <200510260910.j9Q9AKtg075166@freefall.freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-bugs@FreeBSD.org, bug-followup@FreeBSD.org
Subject: Re: kern/87255: Large malloc-backed mfs crashes the system
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Jul 2006 11:16:12 -0000

On Wed, 26 Oct 2005, Yar Tikhiy wrote:

> > In all cases it is a "don't do that then" class of problem.
>
> Yes, of course.  The question is whether we consider it normal for root to 
> have ability to panic the system using standard tools. "cat /dev/zero > 
> /dev/mem" still is the ultimate way to.  IMHO it is a key issue whether we 
> fall back at the academical/research stage where rough corners are OK and 
> the system is just a toy for eggheads, or we pretend our system is stable 
> and robust.  I doubt if an admin can crash the Windows NT kernel from the 
> userland using conventional interfaces.  I by no means expect this issue to 
> be resolved soon, but it's worth being reflected on at tea-time :-)
>
> Apropos, here's another reproducible crash induced by md:
>
> 	# mdconfig -a -t malloc -s 300m
> 	md0
> 	# dd if=/dev/urandom of=/dev/md0 bs=1
> 	dd: /dev/md0: Input/output error
> 	79+0 records in
> 	78+9 records out
> 	# reboot
> 	panic: kmem_malloc(4096): kmem_map too small: 86224896 total allocated
>
> Apparently, it is not a fault of md, just our kernel memory allocator allows 
> other kernel parts to starve it to death.

I'm not sure I entirely go along with this interpretation.  The answer to the 
question "What do do when the kernel runs out of address space?" is not easily 
found.  The "problem" is that md performs potentially unbounded allocation of 
a quite bounded resource -- remember that resource deadlocks are very real, 
sometimes it takes memory to release memory (abstractly, think of memory 
allocation as locking).  UMA supports allocator-enforced resource limits, 
which can be requested by the consumer using uma_zone_set_max().  md(4) should 
probably be using that interface and requesting a resource limit.

There is also a problem then regarding what happens when md(4) runs out of 
resources to allocate when it has already "promised" that it's a disk of a 
certain size up the stack.  I.e., if the result isn't a panic, then how will 
md(4) handle failure?  Most file systems will not be happy when they get EIO, 
so then perhaps the problem is that md(4) provides an abstraction for a 
non-sparse device up the storage stack, but is in fact over-committing.  This 
suggests either that the size of an md device should be strictly bounded if it 
is malloc-backed.  Picking that maximum bound is also tricky.  This is why, in 
practice, we recommend using swap-backed md devices, so that the pages 
associated with the md device can be swapped out under memory pressure, and 
that the swap system have enough memory to fully back the md device.

Robert N M Watson
Computer Laboratory
University of Cambridge