From owner-freebsd-stable@FreeBSD.ORG Wed Jan 29 22:16:45 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 48AF654D; Wed, 29 Jan 2014 22:16:45 +0000 (UTC) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 09AE71B41; Wed, 29 Jan 2014 22:16:43 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id s0TMFEhD048604; Wed, 29 Jan 2014 15:15:14 -0700 (MST) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id s0TMFEMR048603; Wed, 29 Jan 2014 15:15:14 -0700 (MST) (envelope-from ken) Date: Wed, 29 Jan 2014 15:15:14 -0700 From: "Kenneth D. Merry" To: wollman@csail.mit.edu Subject: Re: Heap overflow in mps(4) (was: Re: stable/9 mps(4) rev 254938 == BOOM!) Message-ID: <20140129221514.GA47535@nargothrond.kdm.org> References: <21225.19508.683025.581620@khavrinen.csail.mit.edu> <201401292137.s0TLbD5G006716@hergotha.csail.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201401292137.s0TLbD5G006716@hergotha.csail.mit.edu> User-Agent: Mutt/1.4.2i Cc: hps@bitfrost.no, freebsd-scsi@freebsd.org, scottl@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jan 2014 22:16:45 -0000 On Wed, Jan 29, 2014 at 16:37:13 -0500, wollman@csail.mit.edu wrote: > In article <52E94FC2.1010901@bitfrost.no>, hps@bitfrost.no writes: > >To me this sounds like someone is writing outside their assigned area. > > > >options DEBUG_REDZONE > > hselasky@ nails it! The mps(4) changes in stable/9 r254938 reliably > cause a GPF during boot in non-debugging kernels, but adding > DEBUG_REDZONE is sufficient to prevent the fault. Whichever heap > allocation is being overrun does *not* ever get freed: there are no > redzone messages on the console. (It also boots much faster with the > new probing code, which is certainly a plus for debugging.) > > I can confirm that the tip of stable/9 (r261256) also works with > DEBUG_REDZONE and fails without it. Only trouble is that I need to do > performance testing, which DEBUG_REDZONE is not exactly going to help > with. Hmm. What does vmstat -m show for the mps malloc bucket? Are you booting off of the controller? If not, could you try building mps as a module and unloading it? Perhaps the memory would get freed when the module is unloaded and the redzone code would show where the problem is. How many drives do you have in the system, and how many of them are SAS vs. SATA? I haven't seen this problem, but it may be that we've gotten lucky or don't have the particular set of factors that you have. We have tested with more than 200 drives connected, but they were all SAS. I'll take a look and see if I can see anything that looks suspicious. Ken -- Kenneth Merry ken@FreeBSD.ORG