Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Sep 2006 10:10:41 -0700
From:      "Jack Vogel" <jfvogel@gmail.com>
To:        "Scott Long" <scottl@samsco.org>
Cc:        freebsd-stable@freebsd.org, John Baldwin <jhb@freebsd.org>
Subject:   Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Message-ID:  <2a41acea0609271010w45d79c86ne82b45bc9b551e4a@mail.gmail.com>
In-Reply-To: <451AA7B1.5080202@samsco.org>
References:  <451A1375.5080202@gneto.com> <20060927071538.GF22229@e-Gitt.NET> <451A4189.5020906@samsco.org> <20060927152824.GJ22229@e-Gitt.NET> <20060927155553.GB14563@icarus.home.lan> <20060927155904.GM22229@e-Gitt.NET> <451AA7B1.5080202@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
As an optional data point you might wish to consider the Intel
driver I am about to release, it has everything that 6.2 does
EXCEPT the interrupt changes. I kept those out because I
didn't want to break backward compatibility. If someone that
has repro'd this problem wants to check this speak up and
I'll send a tarball.

Jack


On 9/27/06, Scott Long <scottl@samsco.org> wrote:
> Oliver Brandmueller wrote:
> > Hi,
> >
> > On Wed, Sep 27, 2006 at 08:55:53AM -0700, Jeremy Chadwick wrote:
> >
> >>>The SMBus Interface is not used at all (it's not even really usable).
> >>>Anyway, as soon as I unload the ichsmb module I cannot triger the
> >>>problem anymore. If I load it again, the problem cann again be triggered
> >>>by a buildworld. Statistical relevance: I did 4 buildworlds, alternating
> >>>the load/unload of ichsmb - both times with ichsmb loaded I saw 3
> >>>watchdog timeouts during the buildworld was running, while ichsmb was
> >>>not loaded I did not see a single watchdog timeout. The use of the
> >>>interface was around the same during all the time (constant NFS traffic
> >>>of around 1-2 MBit/s).
> >>
> >>Interesting find.  For what it's worth -- I too load the appropriate
> >>smbus drivers on the system with the "em0 problem" (loading smbus and
> >>ichsmb).  That system is a single processor / single core system, with
> >>HT disabled in the BIOS (which doesn't matter since FreeBSD disables
> >>it anyways).  Kernel is non-SMP.  Only reason I mention this is:
> >>
> >>
> >>>The UP/SMP idea seems to be only of interest, because on an UP machine
> >>>it's more likely to share interrupts than on SMP machines, it has
> >>>nothing to do with the fact of UP or SMP itself.
> >
> >
> > I don't think it has to especially with ichsmb here, but only with the
> > fact, that ichsmb is for me exactly the thing that shares the interrupt
> > with the em interface that shows the problems.
> >
> > - Oliver
> >
>
> My theory here is that something in the kernel, likely VM/VFS, is
> holding the Giant lock for an inordinate amount of time.  During this
> time, an interrupt fires on the shared em/ichsmb interrupt.  The em
> interrupt handler runs and schedules a task to handle the event.  Then
> the system blocks the interrupt at the PIC and schedules the ichsmb
> ithread.  However, as soon as this ithread tries to run, it gets blocked
> on the Giant lock that is held elsewhere.  While it is blocked, the
> interrupt stays masked at the PIC, blocking out both ichsmb and em
> device interrupts.  Normally the PIC would get unmasked after the
> ithread has run, but until the ithread unblocks, this cannot happen.
> This goes on long enough that pending transactions on the em interface
> trigger a timeout.
>
> Assuming the this analysis is correct, there are a couple of questions.
> First would be, why is the ithread being blocked for so long?  Is the
> Giant lock actually being held continuously for that long, or is being
> dropped and relocked often but the scheduler isn't giving the ithread a
> chance to grab it and run?  Second is, why is this only being noticed
> now?  Whether the em driver uses an INTR_FAST handler, like it does now,
> or an ithread handler, like it used to in 6.1, doesn't affect the ichsmb
> driver and its interaction with the Giant lock.  Maybe there isn't a
> direct correlation here, and it's just a coincidence that something else
> in the system changed at the same time as the driver changing.
>
> I have a few ideas on tracking down the root cause, but they are pretty
> pretty painful and slow.  The root cause does need to be found and
> fixed, as it's either a very bad scheduler bug, or a very badly
> misbehaving subsystem.  Both have implications for other possible
> problems in FreeBSD.  Also, the usb driver has the same potential for
> blocking as the ichsmb driver, as do other drivers.  But in the mean
> time, something needs to be done for 6.2.  The options are:
>
> 1. Revert the em driver to its 6.1 form, ask people to test if the
> problem persists.  If it doesn't, leave it at that for now.
>
> 2. Add INTR_FAST shims to the usb and ichsmb drivers so that neither
> uses an ithread.  Without an ithread, no PIC masking will happen, and
> these drivers can block all they want without interfering with the
> em driver.  This is a bit of risky work, though, and may not be possible
> if the devices don't support certain functionality.  Also, it doesn't
> address the root problem.  But, getting more interrupt handlers away
> from needing Giant is a good thing, even if this only a band-aid.
>
> 3. Spend the time tracking down and fixing the root problem for 6.2.
> This is ideal, but it is also an unbounded problem.  Thus, it is
> absolutely not conducive for having a timely and successful 6.2 release.
>
> 4. Do nothing for now and tell people to disable usb, ichsmb, etc, as
> needed.  This, of course, is not a good option.
>
> Option 1 is the quickest and likely most risk-free fix for the 6.2
> release.  If someone could test doing a revert and report back, I would
> appreciate it.  Any volunteers?
>
> Scott
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2a41acea0609271010w45d79c86ne82b45bc9b551e4a>