Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 04 Oct 2006 14:49:29 -0600
From:      Scott Long <scottl@samsco.org>
To:        Guy Brand <gb@isis.u-strasbg.fr>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Message-ID:  <45241E59.2070506@samsco.org>
In-Reply-To: <20061004103154.GK1276@isis.u-strasbg.fr>
References:  <451AA7B1.5080202@samsco.org>	<20060927191402.GB932@turion.vk2pj.dyndns.org>	<20060927210349.GG14975@tnn.dglawrence.com>	<451AEB02.2090806@samsco.org>	<002201c6e290$45ece980$b3db87d4@multiplay.co.uk>	<451BD89F.8080203@samsco.org> <451C1F6D.2020302@mail.uni-mainz.de>	<7.0.1.0.0.20060928152807.17bbe448@sentex.net>	<451C271A.9040904@samsco.org> <20060930011904.GA62626@nowhere> <20061004103154.GK1276@isis.u-strasbg.fr>

next in thread | previous in thread | raw e-mail | index | archive | help
Guy Brand wrote:
> Craig Boston (craig@feniz.gank.org) on 29/09/2006 at 20:19 wrote:
> 
> 
>>One thing this patch definitely did do though, is break the nvidia
>>driver pretty badly.  Couldn't keep the X server running for more than a
>>minute before it froze solid.  Lots of Xid: blah blah blah messages.
>>Yes I remembered to rebuild the kernel module ;)
> 
> 
>   Hi,
> 
> 
>   Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon
>   Oct  2 15:24:04 CEST 2006 DEBUG  i386 on a box having em sharing
>   IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756):
> 
>   interrupt                          total       rate
>   irq1: atkbd0                           5          0
>   irq14: ata0                           47          0
>   irq16: nvidia0 em+                 86545        185
>   irq17: fwohci0                         7          0
>   irq21: twe0                         6426         13
>   cpu0: timer                       927735       1986
>   Total                            1020765       2185
> 
>   I freeze the box by starting firefox which reloads a few tabs I keep
>   open in my session when under X. This is perfectly reproductible.
>   From the logs, first I see:
> 
>     Oct  2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010597
>     Oct  2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 00000000
>     Oct  2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010598
>     Oct  2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010599
>     Oct  2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059a
>     Oct  2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059b
>     Oct  2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059c
>     Oct  2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059d
>     Oct  2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059e
>     Oct  2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059f
>     Oct  2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a0
> 
>   then come the watchdogs:
> 
>     Oct  2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting
>     Oct  2 16:48:56 mojito kernel: em0: link state changed to DOWN
>     Oct  2 16:48:58 mojito kernel: em0: link state changed to UP
>     Oct  2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a1
>     Oct  2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting
>     Oct  2 16:49:06 mojito kernel: em0: link state changed to DOWN
>     Oct  2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a2
>     Oct  2 16:49:08 mojito kernel: em0: link state changed to UP
>     Oct  2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a3
>     Oct  2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting
>     Oct  2 16:49:16 mojito kernel: em0: link state changed to DOWN
>     Oct  2 16:49:18 mojito kernel: em0: link state changed to UP
>     Oct  2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a4
>     Oct  2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting
>     Oct  2 16:49:26 mojito kernel: em0: link state changed to DOWN
>     Oct  2 16:49:29 mojito kernel: em0: link state changed to UP
>     Oct  2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a5
>     Oct  2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting
>     Oct  2 16:49:36 mojito kernel: em0: link state changed to DOWN
>     Oct  2 16:49:39 mojito kernel: em0: link state changed to UP
>     Oct  2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting
>     Oct  2 16:49:47 mojito kernel: em0: link state changed to DOWN
>     Oct  2 16:49:49 mojito kernel: em0: link state changed to UP
> 
>   and the box ends up frozen less than a minute later. The traffic
>   on the Intel card can be low (pinging a host for a few dozen of
>   seconds), medium (reloading a few pages in the tabs of Firefox) or
>   high (downloading several iso images from our local FTP mirror):
>   whatever I do, if both nvidia and em0 are used, the box freezes.
> 
>   Note that I can't freeze the box when doing several simultaneous big
>   downloads or taring up a lot of files but NOT running X. So I guess
>   it is a shared nvidia/em IRQ issue.
> 
>   FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem.
>   The "DEBUG" kernconf is GENERIC + witness options enabled (but they
>   do not help in this case).
> 
>   I traced back to find which changeset introduced the trouble. The
>   results are:
> 
>     #*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00
>     # OK
>     ...
> 
>     #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
>     # OK
>     #
>     #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
>     # BROKEN
>     ...
> 
>     #*default release=cvs tag=RELENG_6
>     # BROKEN
> 
>   From sys commitlogs the culprit commits are:
> 
>   glebius     2006-08-08 09:19:25 utc
>   freebsd src repository
> 
>   modified files:        (branch: releng_6)
>     sys/dev/em           if_em.c 
>   log:
>   sync with head. this includes the following changes in chronological
>   order:
>   
>   o a significant performance improvements. the interrupt handler
>     schedules work to a private taskqueue. the em_rxeof() function
>     runs lockless.
>     rev. 1.98 - 1.101 by scottl.
>     rev. 1.103 by mux
>     rev. 1.106 by glebius, from andrey v. elsukov <bu7cher yandex.ru>
>     rev. 1.116 by glebius
>   o style cleanups:
>     - rev. 1.102, 1.108, 1.109 by glebius
>     - rev. 1.124 by pdeuskar
>   o vendor merges:
>     - merged with vendor driver version 5.1.5 by jack vogel.
>       rev. 1.115 by glebius
>     - merged with vendor driver version 6.0.5 by jack vogel.
>       rev. 1.123 by glebius
>   o various fixes:
>     - invalid use of bus_dma_allocnow
>       rev. 1.104 by scott, 1.121 by yongari
>     - link state handling cleanup.
>       rev. 1.110 by glebius
>     - fix if_baudrate handling.
>       rev. 1.111 by glebius
>     - honor iff_drv_oactive in em_start_locked().
>       rev. 1.117 by yongari
>     - protect eeprom access with the driver lock.
>       rev. 1.118 by yongari
>     - fix link flap on siocgifaddr.
>       rev. 1.119 by yongari
>     - fix dma map handling in em_encap().
>       rev. 1.120,1.122 by yongari
>   
>   revision   changes      path
>   1.65.2.17  +1587 -1443  src/sys/dev/em/if_em.c
> 
> 
>   glebius     2006-08-08 09:20:26 utc
>   freebsd src repository
> 
>   modified files:        (branch: releng_6)
>     sys/dev/em           license readme if_em.h if_em_hw.c 
>                          if_em_hw.h if_em_osdep.h 
>   log:
>   sync with head, merging vendor drivers updates 5.1.5, 6.0.5 by jack vogel.
>   
>   revision  changes     path
>   1.3.2.1   +1 -1       src/sys/dev/em/license
>   1.10.2.1  +71 -30     src/sys/dev/em/readme
>   1.32.2.3  +133 -157   src/sys/dev/em/if_em.h
>   1.16.2.2  +3186 -906  src/sys/dev/em/if_em_hw.c
>   1.15.2.3  +712 -48    src/sys/dev/em/if_em_hw.h
>   1.14.2.2  +46 -15     src/sys/dev/em/if_em_osdep.h
> 
> 
>   I confirmed that by building a kernel from 2006.08.08.09.21.00 which
>   shows the problem and a kernel from 2006.08.08.09.18.00 which works
>   like a charm.
> 
>   Dunno if this could be linked to the em* watchdogs reported in this
>   thread. Let me know if I can do something useful to help fixing this
>   issue.
> 

So you tested before these two changes and after these two changes, yes?
What about with just the first change and not the second?  Anyways, I'm 
starting to see a trend here.  Problem reports are clustering around UP
systems, not SMP systems.  I don't know if that's just coincidence or not.

Can you try a quick test?  Reboot and press '6' at the FreeBSD loader
menu.  That will drop you to a prompt.  Then enter the following line:

set hint.apic.0.disabled=1

Then continue the boot by entering:

boot

The machine should boot up normally.  If it doesn't boot, just reset the 
machine and allow it to boot without the apic change.  With the change,
as well as the up to date em driver, see if you still get the nvidia and
other problems.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45241E59.2070506>