From owner-freebsd-scsi@FreeBSD.ORG  Sun Jan 18 13:57:26 2004
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id AE93E16A4CE; Sun, 18 Jan 2004 13:57:26 -0800 (PST)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 3226643D3F; Sun, 18 Jan 2004 13:57:25 -0800 (PST)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	i0ILvN82097288;	Sun, 18 Jan 2004 13:57:23 -0800 (PST)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id i0ILvNQe097287;
	Sun, 18 Jan 2004 13:57:23 -0800 (PST)
	(envelope-from dillon)
Date: Sun, 18 Jan 2004 13:57:23 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200401182157.i0ILvNQe097287@apollo.backplane.com>
To: Scott Long <scottl@freebsd.org>
References: <Pine.LNX.4.44.0401161607260.26554-100000@Xenon.Stanford.EDU>
	<20040118160802.GC32115@FreeBSD.org.ua>
	<200401181844.i0IIivlQ096389@apollo.backplane.com>
	<400AE3AB.1070102@freebsd.org>
	<200401181957.i0IJvFTe096883@apollo.backplane.com>
	<400AEC20.70709@freebsd.org>
cc: freebsd-hackers@freebsd.org
cc: Paul Twohey <twohey@CS.Stanford.EDU>
cc: Ruslan Ermilov <ru@freebsd.org>
cc: scsi@freebsd.org
Subject: Re: [CHECKER] bugs in FreeBSD
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Jan 2004 21:57:26 -0000


:>     I know cam uses some helper threads so I am not entirely sure about
:>     the context the cam_sim_alloc() calls are being made in, but if they
:>     do not create I/O stalls for already-operational SCSI devices then I
:>     am inclined (in DFly anyway) to simply make the malloc in
:>     cam_sim_alloc() M_WAITOK.
:> 
:> 					-Matt
:> 					Matthew Dillon 
:> 					<dillon@backplane.com>
:> 
:
:In the 4.x case, so long as the driver doesn't do an splcam() or somehow
:block hardware interrupts before calling cam_sim_alloc() you are
:probably fine.  For 5.x, you might run into Giant problems.
:
:Scott

    Well, I don't see how a spl or Giant could possibly have anything to
    do with memory deadlocks.  Both are dropped when a thread blocks so the
    worst that happens is that you add some latency.

    The culprit is almost guarenteed to be blocking in the interrupt threads
    themselves or blocking in serialized multi-device-handling threads
    such as some of CAM's helper threads.  Blocking in either could deadlock
    the system in a low memory situation.

    But what people seem to have done... using M_NOWAIT with very little 
    regard for the side effects that occur when malloc() might then fail,
    is not the right solution.  If the CAM code cannot use a blocking malloc
    for a critical structure allocation then it certainly can't use a 
    non-blocking malloc that might then fail as a workaround!  Some other
    solution is needed for those situations (something like the MPIPE 
    solution I came up with to guarentee the availability of I/O request
    structures in interrupt service routines).

    What it comes down to for cam_sim_alloc() is, again, the context in which
    it is called.  Can it be called from a serialized cam thread or an
    interrupt thread in a way that could potential block I/O operations for
    devices other then the one trying to attach?  If so then there's a real
    problem that needs to be solved.  If not then M_WAITOK can be safely
    used in this particular situation and the NULL case no longer needs to be
    worried about.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>