From owner-freebsd-stable@FreeBSD.ORG Wed Oct 4 20:49:54 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A02F716A415 for ; Wed, 4 Oct 2006 20:49:54 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id F18A643D5C for ; Wed, 4 Oct 2006 20:49:45 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [10.10.3.185] ([165.236.175.187]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k94KnbCx086513; Wed, 4 Oct 2006 14:49:42 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <45241E59.2070506@samsco.org> Date: Wed, 04 Oct 2006 14:49:29 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060206 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Guy Brand References: <451AA7B1.5080202@samsco.org> <20060927191402.GB932@turion.vk2pj.dyndns.org> <20060927210349.GG14975@tnn.dglawrence.com> <451AEB02.2090806@samsco.org> <002201c6e290$45ece980$b3db87d4@multiplay.co.uk> <451BD89F.8080203@samsco.org> <451C1F6D.2020302@mail.uni-mainz.de> <7.0.1.0.0.20060928152807.17bbe448@sentex.net> <451C271A.9040904@samsco.org> <20060930011904.GA62626@nowhere> <20061004103154.GK1276@isis.u-strasbg.fr> In-Reply-To: <20061004103154.GK1276@isis.u-strasbg.fr> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=3.8 tests=none autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: freebsd-stable@freebsd.org Subject: Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Oct 2006 20:49:54 -0000 Guy Brand wrote: > Craig Boston (craig@feniz.gank.org) on 29/09/2006 at 20:19 wrote: > > >>One thing this patch definitely did do though, is break the nvidia >>driver pretty badly. Couldn't keep the X server running for more than a >>minute before it froze solid. Lots of Xid: blah blah blah messages. >>Yes I remembered to rebuild the kernel module ;) > > > Hi, > > > Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon > Oct 2 15:24:04 CEST 2006 DEBUG i386 on a box having em sharing > IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756): > > interrupt total rate > irq1: atkbd0 5 0 > irq14: ata0 47 0 > irq16: nvidia0 em+ 86545 185 > irq17: fwohci0 7 0 > irq21: twe0 6426 13 > cpu0: timer 927735 1986 > Total 1020765 2185 > > I freeze the box by starting firefox which reloads a few tabs I keep > open in my session when under X. This is perfectly reproductible. > From the logs, first I see: > > Oct 2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010597 > Oct 2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 00000000 > Oct 2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010598 > Oct 2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010599 > Oct 2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059a > Oct 2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059b > Oct 2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059c > Oct 2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059d > Oct 2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059e > Oct 2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059f > Oct 2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a0 > > then come the watchdogs: > > Oct 2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting > Oct 2 16:48:56 mojito kernel: em0: link state changed to DOWN > Oct 2 16:48:58 mojito kernel: em0: link state changed to UP > Oct 2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a1 > Oct 2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting > Oct 2 16:49:06 mojito kernel: em0: link state changed to DOWN > Oct 2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a2 > Oct 2 16:49:08 mojito kernel: em0: link state changed to UP > Oct 2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a3 > Oct 2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting > Oct 2 16:49:16 mojito kernel: em0: link state changed to DOWN > Oct 2 16:49:18 mojito kernel: em0: link state changed to UP > Oct 2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a4 > Oct 2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting > Oct 2 16:49:26 mojito kernel: em0: link state changed to DOWN > Oct 2 16:49:29 mojito kernel: em0: link state changed to UP > Oct 2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a5 > Oct 2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting > Oct 2 16:49:36 mojito kernel: em0: link state changed to DOWN > Oct 2 16:49:39 mojito kernel: em0: link state changed to UP > Oct 2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting > Oct 2 16:49:47 mojito kernel: em0: link state changed to DOWN > Oct 2 16:49:49 mojito kernel: em0: link state changed to UP > > and the box ends up frozen less than a minute later. The traffic > on the Intel card can be low (pinging a host for a few dozen of > seconds), medium (reloading a few pages in the tabs of Firefox) or > high (downloading several iso images from our local FTP mirror): > whatever I do, if both nvidia and em0 are used, the box freezes. > > Note that I can't freeze the box when doing several simultaneous big > downloads or taring up a lot of files but NOT running X. So I guess > it is a shared nvidia/em IRQ issue. > > FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem. > The "DEBUG" kernconf is GENERIC + witness options enabled (but they > do not help in this case). > > I traced back to find which changeset introduced the trouble. The > results are: > > #*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00 > # OK > ... > > #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56 > # OK > # > #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00 > # BROKEN > ... > > #*default release=cvs tag=RELENG_6 > # BROKEN > > From sys commitlogs the culprit commits are: > > glebius 2006-08-08 09:19:25 utc > freebsd src repository > > modified files: (branch: releng_6) > sys/dev/em if_em.c > log: > sync with head. this includes the following changes in chronological > order: > > o a significant performance improvements. the interrupt handler > schedules work to a private taskqueue. the em_rxeof() function > runs lockless. > rev. 1.98 - 1.101 by scottl. > rev. 1.103 by mux > rev. 1.106 by glebius, from andrey v. elsukov > rev. 1.116 by glebius > o style cleanups: > - rev. 1.102, 1.108, 1.109 by glebius > - rev. 1.124 by pdeuskar > o vendor merges: > - merged with vendor driver version 5.1.5 by jack vogel. > rev. 1.115 by glebius > - merged with vendor driver version 6.0.5 by jack vogel. > rev. 1.123 by glebius > o various fixes: > - invalid use of bus_dma_allocnow > rev. 1.104 by scott, 1.121 by yongari > - link state handling cleanup. > rev. 1.110 by glebius > - fix if_baudrate handling. > rev. 1.111 by glebius > - honor iff_drv_oactive in em_start_locked(). > rev. 1.117 by yongari > - protect eeprom access with the driver lock. > rev. 1.118 by yongari > - fix link flap on siocgifaddr. > rev. 1.119 by yongari > - fix dma map handling in em_encap(). > rev. 1.120,1.122 by yongari > > revision changes path > 1.65.2.17 +1587 -1443 src/sys/dev/em/if_em.c > > > glebius 2006-08-08 09:20:26 utc > freebsd src repository > > modified files: (branch: releng_6) > sys/dev/em license readme if_em.h if_em_hw.c > if_em_hw.h if_em_osdep.h > log: > sync with head, merging vendor drivers updates 5.1.5, 6.0.5 by jack vogel. > > revision changes path > 1.3.2.1 +1 -1 src/sys/dev/em/license > 1.10.2.1 +71 -30 src/sys/dev/em/readme > 1.32.2.3 +133 -157 src/sys/dev/em/if_em.h > 1.16.2.2 +3186 -906 src/sys/dev/em/if_em_hw.c > 1.15.2.3 +712 -48 src/sys/dev/em/if_em_hw.h > 1.14.2.2 +46 -15 src/sys/dev/em/if_em_osdep.h > > > I confirmed that by building a kernel from 2006.08.08.09.21.00 which > shows the problem and a kernel from 2006.08.08.09.18.00 which works > like a charm. > > Dunno if this could be linked to the em* watchdogs reported in this > thread. Let me know if I can do something useful to help fixing this > issue. > So you tested before these two changes and after these two changes, yes? What about with just the first change and not the second? Anyways, I'm starting to see a trend here. Problem reports are clustering around UP systems, not SMP systems. I don't know if that's just coincidence or not. Can you try a quick test? Reboot and press '6' at the FreeBSD loader menu. That will drop you to a prompt. Then enter the following line: set hint.apic.0.disabled=1 Then continue the boot by entering: boot The machine should boot up normally. If it doesn't boot, just reset the machine and allow it to boot without the apic change. With the change, as well as the up to date em driver, see if you still get the nvidia and other problems. Scott