Date: Tue, 12 Apr 2005 22:21:11 -0600 From: Scott Long <scottl@samsco.org> To: David Sze <dsze@alumni.uwaterloo.ca> Cc: mb@imp.ch Subject: Re: [PATCH] Stability fixes for IPS driver for 4.x Message-ID: <425C9E37.2010105@samsco.org> In-Reply-To: <6.2.1.2.2.20050412234622.05a6daf8@mail.distrust.net> References: <4257F20C.70004@samsco.org> <6.2.1.2.2.20050411005214.065dc018@mail.distrust.net> <425A0BB2.10704@samsco.org> <6.2.1.2.2.20050411234713.069afb28@mail.distrust.net> <425C12E3.5050205@samsco.org> <6.2.1.2.2.20050412234622.05a6daf8@mail.distrust.net>
next in thread | previous in thread | raw e-mail | index | archive | help
David Sze wrote: > At 12:26 PM 12/04/2005 -0600, Scott Long wrote this to All: > >> David Sze wrote: >> >>> At 11:31 PM 10/04/2005 -0600, Scott Long wrote this to All: >>> >>>> Making a driver PAE-ified means either teaching it to do 64-bit >>>> scatter-gather (assuming that the peripheral hardware can do this >>>> and that it's documented), or teaching the driver to correctly handle >>>> EINPROGRESS from bus_dmamap_load() along with using the proper busdma >>>> tag limits. The strategy I took with 6.x/5.x was the second one since >>>> I didn't have good IPS docs in front of me and I wanted it follow the >>>> APIs correctly. I did test it with 8GB of memory and it performed >>>> correctly under load. I haven't taken a close enough look at your >>>> MFC patch to say for sure if it's correct or not. I'm not sure if >>>> I'll have time to take another look in the next few days, >>>> unfortunately. >>>> Is there any chance you could test 5.x/6.0 under load with PAE just to >>>> validate the assertion that it works correctly there? >>> >>> >>> I had a chance to test 5.4-RC1 (i386) today with GENERIC, SMP, PAE, >>> and SMP-PAE kernels (the last one is just PAE with "options SMP"). >>> To recap, the hardware is an IBM xSeries 346, Dual Xeon 3GHz >>> (non-E64MT), ServeRAID-7K. >>> GENERIC and SMP survived "make buildkernel", but PAE and SMP-PAE >>> paniced reproducibly doing the same. The DDB stack trace doesn't >>> appear to be anywhere near the IPS driver though, so I'm way out of >>> my league. >> >> >> Darnit, hard to say if this is an existing bug in 5.4 or if it's a >> bug/corruption in ips.Can you re-run with PAE disabled? > > > Works fine with PAE disabled (or at least I couldn't get it to panic), > both UP and SMP kernels. > > >> Would you be >> willing to put the Giant lock back on top of the driver? This would >> mean modifying the call to bus_intr_config(), adding the D_GIANTNEEDED >> flag to the disk structure in disk_create(), and switching the mutex >> argument in bus_dma_tag_create() for the sg_dmatag tag. > > > I put Giant back in as you described (patch attached), but it still > panic'ed with PAE enabled, both UP and SMP kernels. The stack trace was > very similar; the fault address (0x24) and the top three stack frames > were the same as without Giant: > > propagate_priority > turnstile_wait > _mtx_lock_sleep > > At this point I no longer have access to the hardware, the customer > wanted his servers back. They're going into the datacenter with > RELENG_4 (w/IPS stability patch), without PAE (so the top ~900MB of his > 4GB RAM is lost to PCI-X address space). > > Crumbs, I see a potential problem. I won't have time until this weekend to sort it out, though. Sorry this has become such a drawn-out affair, I hope that your customer isn't too upset. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?425C9E37.2010105>