From owner-freebsd-stable@FreeBSD.ORG Wed Sep 27 17:11:46 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 945D616A4D4 for ; Wed, 27 Sep 2006 17:11:46 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.179]) by mx1.FreeBSD.org (Postfix) with ESMTP id F317443DD7 for ; Wed, 27 Sep 2006 17:10:43 +0000 (GMT) (envelope-from jfvogel@gmail.com) Received: by py-out-1112.google.com with SMTP id o67so346605pye for ; Wed, 27 Sep 2006 10:10:43 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=TpOay2tdzWA2Tbr4I/oJNfbNkYU/5nbb+jhFddqspxEcKDWxujjLdbhXcLuWHoZB1jB3mZmlGxhd4u0dGnuSjQM9DPbIkPd3nWchZrI4uUgGaidMKumDeHhapdZ1gtYsslDn1Es/NFzzbM2k1EqbyIdaUrQg4fmcbwW4FI3We7U= Received: by 10.35.53.18 with SMTP id f18mr1840518pyk; Wed, 27 Sep 2006 10:10:42 -0700 (PDT) Received: by 10.35.119.14 with HTTP; Wed, 27 Sep 2006 10:10:41 -0700 (PDT) Message-ID: <2a41acea0609271010w45d79c86ne82b45bc9b551e4a@mail.gmail.com> Date: Wed, 27 Sep 2006 10:10:41 -0700 From: "Jack Vogel" To: "Scott Long" In-Reply-To: <451AA7B1.5080202@samsco.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <451A1375.5080202@gneto.com> <20060927071538.GF22229@e-Gitt.NET> <451A4189.5020906@samsco.org> <20060927152824.GJ22229@e-Gitt.NET> <20060927155553.GB14563@icarus.home.lan> <20060927155904.GM22229@e-Gitt.NET> <451AA7B1.5080202@samsco.org> Cc: freebsd-stable@freebsd.org, John Baldwin Subject: Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Sep 2006 17:11:46 -0000 As an optional data point you might wish to consider the Intel driver I am about to release, it has everything that 6.2 does EXCEPT the interrupt changes. I kept those out because I didn't want to break backward compatibility. If someone that has repro'd this problem wants to check this speak up and I'll send a tarball. Jack On 9/27/06, Scott Long wrote: > Oliver Brandmueller wrote: > > Hi, > > > > On Wed, Sep 27, 2006 at 08:55:53AM -0700, Jeremy Chadwick wrote: > > > >>>The SMBus Interface is not used at all (it's not even really usable). > >>>Anyway, as soon as I unload the ichsmb module I cannot triger the > >>>problem anymore. If I load it again, the problem cann again be triggered > >>>by a buildworld. Statistical relevance: I did 4 buildworlds, alternating > >>>the load/unload of ichsmb - both times with ichsmb loaded I saw 3 > >>>watchdog timeouts during the buildworld was running, while ichsmb was > >>>not loaded I did not see a single watchdog timeout. The use of the > >>>interface was around the same during all the time (constant NFS traffic > >>>of around 1-2 MBit/s). > >> > >>Interesting find. For what it's worth -- I too load the appropriate > >>smbus drivers on the system with the "em0 problem" (loading smbus and > >>ichsmb). That system is a single processor / single core system, with > >>HT disabled in the BIOS (which doesn't matter since FreeBSD disables > >>it anyways). Kernel is non-SMP. Only reason I mention this is: > >> > >> > >>>The UP/SMP idea seems to be only of interest, because on an UP machine > >>>it's more likely to share interrupts than on SMP machines, it has > >>>nothing to do with the fact of UP or SMP itself. > > > > > > I don't think it has to especially with ichsmb here, but only with the > > fact, that ichsmb is for me exactly the thing that shares the interrupt > > with the em interface that shows the problems. > > > > - Oliver > > > > My theory here is that something in the kernel, likely VM/VFS, is > holding the Giant lock for an inordinate amount of time. During this > time, an interrupt fires on the shared em/ichsmb interrupt. The em > interrupt handler runs and schedules a task to handle the event. Then > the system blocks the interrupt at the PIC and schedules the ichsmb > ithread. However, as soon as this ithread tries to run, it gets blocked > on the Giant lock that is held elsewhere. While it is blocked, the > interrupt stays masked at the PIC, blocking out both ichsmb and em > device interrupts. Normally the PIC would get unmasked after the > ithread has run, but until the ithread unblocks, this cannot happen. > This goes on long enough that pending transactions on the em interface > trigger a timeout. > > Assuming the this analysis is correct, there are a couple of questions. > First would be, why is the ithread being blocked for so long? Is the > Giant lock actually being held continuously for that long, or is being > dropped and relocked often but the scheduler isn't giving the ithread a > chance to grab it and run? Second is, why is this only being noticed > now? Whether the em driver uses an INTR_FAST handler, like it does now, > or an ithread handler, like it used to in 6.1, doesn't affect the ichsmb > driver and its interaction with the Giant lock. Maybe there isn't a > direct correlation here, and it's just a coincidence that something else > in the system changed at the same time as the driver changing. > > I have a few ideas on tracking down the root cause, but they are pretty > pretty painful and slow. The root cause does need to be found and > fixed, as it's either a very bad scheduler bug, or a very badly > misbehaving subsystem. Both have implications for other possible > problems in FreeBSD. Also, the usb driver has the same potential for > blocking as the ichsmb driver, as do other drivers. But in the mean > time, something needs to be done for 6.2. The options are: > > 1. Revert the em driver to its 6.1 form, ask people to test if the > problem persists. If it doesn't, leave it at that for now. > > 2. Add INTR_FAST shims to the usb and ichsmb drivers so that neither > uses an ithread. Without an ithread, no PIC masking will happen, and > these drivers can block all they want without interfering with the > em driver. This is a bit of risky work, though, and may not be possible > if the devices don't support certain functionality. Also, it doesn't > address the root problem. But, getting more interrupt handlers away > from needing Giant is a good thing, even if this only a band-aid. > > 3. Spend the time tracking down and fixing the root problem for 6.2. > This is ideal, but it is also an unbounded problem. Thus, it is > absolutely not conducive for having a timely and successful 6.2 release. > > 4. Do nothing for now and tell people to disable usb, ichsmb, etc, as > needed. This, of course, is not a good option. > > Option 1 is the quickest and likely most risk-free fix for the 6.2 > release. If someone could test doing a revert and report back, I would > appreciate it. Any volunteers? > > Scott > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >