From owner-freebsd-stable Fri Jun 21 3:11:26 2002 Delivered-To: freebsd-stable@freebsd.org Received: from alogis.com (firewall.solit-ag.de [212.184.102.1]) by hub.freebsd.org (Postfix) with ESMTP id 04BAA37B400 for ; Fri, 21 Jun 2002 03:11:22 -0700 (PDT) Received: from alogis.com (kipp@clausthal.int1.b.intern [10.1.1.30]) by alogis.com (8.11.1/8.9.3) with ESMTP id g5LAAml47828; Fri, 21 Jun 2002 12:10:48 +0200 (CEST) (envelope-from holger.kipp@alogis.com) Message-ID: <3D12FBAB.8C676DA9@alogis.com> Date: Fri, 21 Jun 2002 12:10:51 +0200 From: Holger Kipp X-Mailer: Mozilla 4.7 [en] (X11; U; Linux 2.2.13 i686) X-Accept-Language: en MIME-Version: 1.0 To: Pete French Cc: frank@exit.com, pjklist@ekahuna.com, stable@FreeBSD.ORG Subject: Re: Status of fxp / smp problem? References: Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Pete French wrote: > > > Not only sym/fxp, but also sym/ata, only sym, or only fxp or even others. > > Did we ever track down a time window as to *when* the changes were made > thats caused this to start happening ? For me it was the update on May 22 > that started it all going wrong, but I cant (unfortunately) remember what > date of -STABLE the machine was running up until that point. Hmm - I could produce similar errors with 4.5-RELEASE (not exactly the same behaviour, the fxp behaved more sluggish, so to speak, so I didn't get system hangs as abruptly). As a guess, code changes (improvements) that change the timing noticeably might lead to these problems coming to the surface more often. > > Problem seems to manifest itself especially on systems with: > > - shared IRQs AND > > - SMP enabled > > One added thing here - I had shared IRQ's between ata and sym, and the > problem went away when I took the ata driver out of the kerenl. *but* I > do not have any devices attached to the atat controller, so (preseumably) > it could not have actually been interrupting ? You have two drivers who have to react to the same IRQ, so maybe its some sort of race condition... But thats more for developers, who know their IRQs by heart . > Speculation: preseumably wth a shared IRQ the system scans devices it > knows are attached to that IRQ until it finds one which needs service ? Any > ideas what order it will do this in - i.e. would it be possible for it > to scan ata, followed by sym, and for there to be some oddity in the IRQ > code that stops it continuing on to scan sym under certain circumstances ? > Unsure as to how this might happen, and I havent looked at the IRQ code, > but I do have a machine on which I can reproduce the problem 100% reliably > if that helps. Wish I had the time. *Sigh* Or is there a IRQ debugging switch somewhere around within the system? That reminds me of an old assembler problem I once had (6502), where I couldn't debug the problem with debug statements, as they changed the timing such that the bug didn't occur... > PS: committing that sym workaround would be really nice as I could at least > then use our Compaq multiprocessor machines reliably. Hmm, looks like Gérard didn't have the time to polish his code yet and commit it. I'd suggest we give him some more time before we complain, as he also has a living ;-) -- Holger Kipp, Dipl.-Math., Systemadministrator | alogis AG Fon: +49 (0)30 / 43 65 8 - 114 | Berliner Strasse 26 Fax: +49 (0)30 / 43 65 8 - 214 | D-13507 Berlin Tegel email: holger.kipp@alogis.com | http://www.alogis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message