From owner-freebsd-stable@FreeBSD.ORG Wed Oct 4 10:34:13 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DBFCE16A407 for ; Wed, 4 Oct 2006 10:34:12 +0000 (UTC) (envelope-from gb@isis.u-strasbg.fr) Received: from chimie.u-strasbg.fr (chimie.u-strasbg.fr [130.79.40.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1195B43D46 for ; Wed, 4 Oct 2006 10:34:11 +0000 (GMT) (envelope-from gb@isis.u-strasbg.fr) Received: from localhost (localhost.localdomain [127.0.0.1]) by chimie.u-strasbg.fr (Postfix) with ESMTP id C2E766DFED for ; Wed, 4 Oct 2006 12:34:09 +0200 (CEST) Received: from chimie.u-strasbg.fr ([127.0.0.1]) by localhost (chimie [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 10372-02 for ; Wed, 4 Oct 2006 12:34:09 +0200 (CEST) Received: from 6nq.u-strasbg.fr (chimie.u-strasbg.fr [130.79.40.6]) by chimie.u-strasbg.fr (Postfix) with ESMTP id 662C06DFEC for ; Wed, 4 Oct 2006 12:34:09 +0200 (CEST) Received: by 6nq.u-strasbg.fr (Postfix, from userid 1001) id E2388174FC; Wed, 4 Oct 2006 12:31:54 +0200 (CEST) Date: Wed, 4 Oct 2006 12:31:54 +0200 From: Guy Brand To: freebsd-stable@freebsd.org Message-ID: <20061004103154.GK1276@isis.u-strasbg.fr> References: <451AA7B1.5080202@samsco.org> <20060927191402.GB932@turion.vk2pj.dyndns.org> <20060927210349.GG14975@tnn.dglawrence.com> <451AEB02.2090806@samsco.org> <002201c6e290$45ece980$b3db87d4@multiplay.co.uk> <451BD89F.8080203@samsco.org> <451C1F6D.2020302@mail.uni-mainz.de> <7.0.1.0.0.20060928152807.17bbe448@sentex.net> <451C271A.9040904@samsco.org> <20060930011904.GA62626@nowhere> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20060930011904.GA62626@nowhere> x-gpg-fingerprint: B423 4924 012E 52F3 BA9E 547F CC8C 0BC5 9C0E B1CA x-gpg-key: 9C0EB1CA User-Agent: Mutt/1.5.13 (2006-08-11) X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at chimie.u-strasbg.fr Subject: Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Oct 2006 10:34:13 -0000 Craig Boston (craig@feniz.gank.org) on 29/09/2006 at 20:19 wrote: > One thing this patch definitely did do though, is break the nvidia > driver pretty badly. Couldn't keep the X server running for more than a > minute before it froze solid. Lots of Xid: blah blah blah messages. > Yes I remembered to rebuild the kernel module ;) Hi, Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon Oct 2 15:24:04 CEST 2006 DEBUG i386 on a box having em sharing IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756): interrupt total rate irq1: atkbd0 5 0 irq14: ata0 47 0 irq16: nvidia0 em+ 86545 185 irq17: fwohci0 7 0 irq21: twe0 6426 13 cpu0: timer 927735 1986 Total 1020765 2185 I freeze the box by starting firefox which reloads a few tabs I keep open in my session when under X. This is perfectly reproductible. From the logs, first I see: Oct 2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010597 Oct 2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 00000000 Oct 2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010598 Oct 2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010599 Oct 2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059a Oct 2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059b Oct 2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059c Oct 2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059d Oct 2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059e Oct 2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059f Oct 2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a0 then come the watchdogs: Oct 2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:48:56 mojito kernel: em0: link state changed to DOWN Oct 2 16:48:58 mojito kernel: em0: link state changed to UP Oct 2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a1 Oct 2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:06 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a2 Oct 2 16:49:08 mojito kernel: em0: link state changed to UP Oct 2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a3 Oct 2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:16 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:18 mojito kernel: em0: link state changed to UP Oct 2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a4 Oct 2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:26 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:29 mojito kernel: em0: link state changed to UP Oct 2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a5 Oct 2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:36 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:39 mojito kernel: em0: link state changed to UP Oct 2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:47 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:49 mojito kernel: em0: link state changed to UP and the box ends up frozen less than a minute later. The traffic on the Intel card can be low (pinging a host for a few dozen of seconds), medium (reloading a few pages in the tabs of Firefox) or high (downloading several iso images from our local FTP mirror): whatever I do, if both nvidia and em0 are used, the box freezes. Note that I can't freeze the box when doing several simultaneous big downloads or taring up a lot of files but NOT running X. So I guess it is a shared nvidia/em IRQ issue. FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem. The "DEBUG" kernconf is GENERIC + witness options enabled (but they do not help in this case). I traced back to find which changeset introduced the trouble. The results are: #*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00 # OK ... #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56 # OK # #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00 # BROKEN ... #*default release=cvs tag=RELENG_6 # BROKEN From sys commitlogs the culprit commits are: glebius 2006-08-08 09:19:25 utc freebsd src repository modified files: (branch: releng_6) sys/dev/em if_em.c log: sync with head. this includes the following changes in chronological order: o a significant performance improvements. the interrupt handler schedules work to a private taskqueue. the em_rxeof() function runs lockless. rev. 1.98 - 1.101 by scottl. rev. 1.103 by mux rev. 1.106 by glebius, from andrey v. elsukov rev. 1.116 by glebius o style cleanups: - rev. 1.102, 1.108, 1.109 by glebius - rev. 1.124 by pdeuskar o vendor merges: - merged with vendor driver version 5.1.5 by jack vogel. rev. 1.115 by glebius - merged with vendor driver version 6.0.5 by jack vogel. rev. 1.123 by glebius o various fixes: - invalid use of bus_dma_allocnow rev. 1.104 by scott, 1.121 by yongari - link state handling cleanup. rev. 1.110 by glebius - fix if_baudrate handling. rev. 1.111 by glebius - honor iff_drv_oactive in em_start_locked(). rev. 1.117 by yongari - protect eeprom access with the driver lock. rev. 1.118 by yongari - fix link flap on siocgifaddr. rev. 1.119 by yongari - fix dma map handling in em_encap(). rev. 1.120,1.122 by yongari revision changes path 1.65.2.17 +1587 -1443 src/sys/dev/em/if_em.c glebius 2006-08-08 09:20:26 utc freebsd src repository modified files: (branch: releng_6) sys/dev/em license readme if_em.h if_em_hw.c if_em_hw.h if_em_osdep.h log: sync with head, merging vendor drivers updates 5.1.5, 6.0.5 by jack vogel. revision changes path 1.3.2.1 +1 -1 src/sys/dev/em/license 1.10.2.1 +71 -30 src/sys/dev/em/readme 1.32.2.3 +133 -157 src/sys/dev/em/if_em.h 1.16.2.2 +3186 -906 src/sys/dev/em/if_em_hw.c 1.15.2.3 +712 -48 src/sys/dev/em/if_em_hw.h 1.14.2.2 +46 -15 src/sys/dev/em/if_em_osdep.h I confirmed that by building a kernel from 2006.08.08.09.21.00 which shows the problem and a kernel from 2006.08.08.09.18.00 which works like a charm. Dunno if this could be linked to the em* watchdogs reported in this thread. Let me know if I can do something useful to help fixing this issue. -- bug