From owner-freebsd-current@FreeBSD.ORG Mon Jun 28 17:32:05 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A7C6416A4CE for ; Mon, 28 Jun 2004 17:32:05 +0000 (GMT) Received: from smtp-relay2.palmone.com (palmone-64-28-152-193.palmone.com [64.28.152.193]) by mx1.FreeBSD.org (Postfix) with SMTP id 3ABF443D1F for ; Mon, 28 Jun 2004 17:32:03 +0000 (GMT) (envelope-from freebsd-bugs@mikhailov.org) Received: from unknown(148.92.223.30) by smtp-relay2.palmone.com via csmap id af62b046_c930_11d8_8e25_00304811ff5e_29553; Mon, 28 Jun 2004 11:26:36 -0700 (PDT) Received: from mvpxp (sep00e075241c73.palm1.palmone.com [148.92.208.101] (may be forged))i5SHW4D09968 for ; Mon, 28 Jun 2004 10:32:04 -0700 (PDT) From: "Vadim Mikhailov" To: Date: Mon, 28 Jun 2004 10:32:00 -0700 Message-ID: <678213ABF77E5D4F9E6CF1DA61A4E2D518413E@usmilm005.palm1.palmone.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 Importance: Normal X-Mailman-Approved-At: Tue, 29 Jun 2004 12:12:27 +0000 Subject: Re: [kern/68351] bge0 watchdog timeout on 5.2.1 and -current, 5.1 is ok X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Jun 2004 17:32:05 -0000 Hi, I have a Dell PowerEdge 1750 server with 2 Xeon 3.0 GHZ CPUs, 4 GB RAM = and 2 onboard gigabit ethernet ports: bge0: mem 0xfcd20000-0xfcd2ffff,0xfcd30000-0xfcd3ffff irq 17 at device 0.0 on pci2 bge1: mem 0xfcd00000-0xfcd0ffff,0xfcd10000-0xfcd1ffff irq 18 at device 0.1 on pci2 =20 Only bge0 is used, with jumbo frames (my gigabit switch PowerConnect = 5224 supports them): bge0: flags=3D8843 mtu 9000 options=3D1b inet 172.xx.xx.xx netmask 0xfffff800 broadcast 172.xx.xx.255 ether 00:06:5b:ef:63:e6 media: Ethernet autoselect (1000baseTX ) status: active This box has two dualport SCSI adapters: mpt0: port 0xbc00-0xbcff mem 0xfcb20000-0xfcb2ffff,0xfcb30000-0xfcb3ffff irq 13 at device 5.0 on pci4 mpt1: port 0xb800-0xb8ff mem 0xfcb00000-0xfcb0ffff,0xfcb10000-0xfcb1ffff irq 16 at device 5.1 on pci4 ahc0: port 0xdc00-0xdcff mem 0xfcf01000-0xfcf01fff irq 19 at device 4.0 on pci1 ahc1: port 0xd800-0xd8ff mem 0xfcf00000-0xfcf00fff irq 20 at device 4.1 on pci1 Each adapter has disks attached to them. Firmware on motherboard and all peripherial devices is upgraded to the very latest versions from Dell. This setup works more or less ok under FreeBSD 5.1-RELEASE-p8 (GENERIC kernel with SMP enabled), but once a month or two machine reboots under load, so I want to upgrade = it to 5.2.1-RELEASE. But when I boot 5.2.1-RELEASE or later kernel (-current) on this box, network adapter locks up. I see these messages on console and in the logs: Jun 25 15:25:22 vortex kernel: bge0: watchdog timeout -- resetting =20 If I do "ifconfig bge0 down up", network becomes available for few = seconds and then machine is not pingable again. I ran "systat -v" and have noticed that = ping stops working exactly when I see any interrupt coming to mpt or ahc (i.e. on = any disk activity). =20 One visible difference between 5.1 (where it works) and 5.2.1/current = (where it doesn't) is that interrupts to PCI devices are getting assigned differently: IRQ map under 5.1: mpt0 13, mpt1 16, bge0 17, bge0 18, ahc0 19, ahc1 20, and under 5.2.1: mpt0 18, mpt1 19, bge0 16, bge1 17, ahc0 20, ahc1 21. I have tried to change IRQ assignment to PCI devices in the BIOS, but it didn't change anything from FreeBSD point of view. I have also tried to boot 5.2.1 = with ACPI disabled - result is the same. Disabling jumbo frames does not seem to have any = effect either. Also I tried this on another identical 1750 box (I have few of them) - = same result. It works fine under Linux kernel 2.4.18. I there any way I can track this down? I can provide more information (verbose boot logs etc) if needed... All this information has also been filed in this bug report: http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/68351 Thanks! -- Vadim Mikhailov