From owner-freebsd-current@FreeBSD.ORG Sun May 31 06:45:18 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B7E591065673 for ; Sun, 31 May 2009 06:45:18 +0000 (UTC) (envelope-from rmtodd@ichotolot.servalan.com) Received: from mx1.synetsystems.com (mx1.synetsystems.com [76.10.206.14]) by mx1.freebsd.org (Postfix) with ESMTP id 985D28FC18 for ; Sun, 31 May 2009 06:45:18 +0000 (UTC) (envelope-from rmtodd@ichotolot.servalan.com) Received: by mx1.synetsystems.com (Postfix, from userid 66) id EB9ADCC9; Sun, 31 May 2009 02:45:17 -0400 (EDT) Received: from localhost ([127.0.0.1]:28743 helo=ichotolot.servalan.com) by servalan.servalan.com with esmtp (Exim 4.69 (FreeBSD)) (envelope-from ) id 1MAePV-0004Mm-D3 for freebsd-current@freebsd.org; Sun, 31 May 2009 01:20:45 -0500 To: freebsd-current@freebsd.org Date: Sun, 31 May 2009 01:20:45 -0500 From: Richard Todd Message-Id: <20090531064517.EB9ADCC9@mx1.synetsystems.com> Subject: Bug in recent large_alloc changes to the ZFS zio code? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 May 2009 06:45:19 -0000 Okay, I'm looking at the recent changes in the ZFS zio code to change how data buffers are allocated (svn r192207). The old code for zio_data_buf_alloc just called kmem_alloc (the Solaris compatibility one), which in turn called malloc() with M_WAITOK, so it would always be guaranteed of getting a valid, non-null pointer. Fair enough. The new code has an alternate code path, where in "arc_large_memory_enabled" mode, it calls the new function zio_large_malloc instead. zio_large_malloc in turn tries a few times to allocate the required pages using vm_phys_alloc_contig, but if that fails goes ahead and returns NULL. Here's the problem. As near as I can tell, none of the code that calls zio_data_buf_alloc appears to check for the possibility that the returned pointer could be NULL, which I guess is reasonable as the original code never could return NULL. However, the new large malloc code *can* return NULL, which causes the obvious problem. The other day I mentioned here a panic I saw where under sufficiently heavy load the GEOM code was complaining that it had been given a NULL data pointer. It seems to me that that was likely because zio had tried to allocate a data buffer and gotten a NULL pointer instead.