From owner-freebsd-stable@FreeBSD.ORG Wed Apr 13 04:24:29 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E092D16A4CE for ; Wed, 13 Apr 2005 04:24:28 +0000 (GMT) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3065743D49 for ; Wed, 13 Apr 2005 04:24:26 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.11] (junior-wifi.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.1/8.13.1) with ESMTP id j3D4RZgn042046; Tue, 12 Apr 2005 22:27:35 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <425C9E37.2010105@samsco.org> Date: Tue, 12 Apr 2005 22:21:11 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.5) Gecko/20050218 X-Accept-Language: en-us, en MIME-Version: 1.0 To: David Sze References: <4257F20C.70004@samsco.org> <6.2.1.2.2.20050411005214.065dc018@mail.distrust.net> <425A0BB2.10704@samsco.org> <6.2.1.2.2.20050411234713.069afb28@mail.distrust.net> <425C12E3.5050205@samsco.org> <6.2.1.2.2.20050412234622.05a6daf8@mail.distrust.net> In-Reply-To: <6.2.1.2.2.20050412234622.05a6daf8@mail.distrust.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.8 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on pooker.samsco.org cc: Anthony Downer cc: stable@freebsd.org cc: mb@imp.ch Subject: Re: [PATCH] Stability fixes for IPS driver for 4.x X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Apr 2005 04:24:29 -0000 David Sze wrote: > At 12:26 PM 12/04/2005 -0600, Scott Long wrote this to All: > >> David Sze wrote: >> >>> At 11:31 PM 10/04/2005 -0600, Scott Long wrote this to All: >>> >>>> Making a driver PAE-ified means either teaching it to do 64-bit >>>> scatter-gather (assuming that the peripheral hardware can do this >>>> and that it's documented), or teaching the driver to correctly handle >>>> EINPROGRESS from bus_dmamap_load() along with using the proper busdma >>>> tag limits. The strategy I took with 6.x/5.x was the second one since >>>> I didn't have good IPS docs in front of me and I wanted it follow the >>>> APIs correctly. I did test it with 8GB of memory and it performed >>>> correctly under load. I haven't taken a close enough look at your >>>> MFC patch to say for sure if it's correct or not. I'm not sure if >>>> I'll have time to take another look in the next few days, >>>> unfortunately. >>>> Is there any chance you could test 5.x/6.0 under load with PAE just to >>>> validate the assertion that it works correctly there? >>> >>> >>> I had a chance to test 5.4-RC1 (i386) today with GENERIC, SMP, PAE, >>> and SMP-PAE kernels (the last one is just PAE with "options SMP"). >>> To recap, the hardware is an IBM xSeries 346, Dual Xeon 3GHz >>> (non-E64MT), ServeRAID-7K. >>> GENERIC and SMP survived "make buildkernel", but PAE and SMP-PAE >>> paniced reproducibly doing the same. The DDB stack trace doesn't >>> appear to be anywhere near the IPS driver though, so I'm way out of >>> my league. >> >> >> Darnit, hard to say if this is an existing bug in 5.4 or if it's a >> bug/corruption in ips.Can you re-run with PAE disabled? > > > Works fine with PAE disabled (or at least I couldn't get it to panic), > both UP and SMP kernels. > > >> Would you be >> willing to put the Giant lock back on top of the driver? This would >> mean modifying the call to bus_intr_config(), adding the D_GIANTNEEDED >> flag to the disk structure in disk_create(), and switching the mutex >> argument in bus_dma_tag_create() for the sg_dmatag tag. > > > I put Giant back in as you described (patch attached), but it still > panic'ed with PAE enabled, both UP and SMP kernels. The stack trace was > very similar; the fault address (0x24) and the top three stack frames > were the same as without Giant: > > propagate_priority > turnstile_wait > _mtx_lock_sleep > > At this point I no longer have access to the hardware, the customer > wanted his servers back. They're going into the datacenter with > RELENG_4 (w/IPS stability patch), without PAE (so the top ~900MB of his > 4GB RAM is lost to PCI-X address space). > > Crumbs, I see a potential problem. I won't have time until this weekend to sort it out, though. Sorry this has become such a drawn-out affair, I hope that your customer isn't too upset. Scott