Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Feb 2011 13:41:30 -0800
From:      Pyun YongHyeon <pyunyh@gmail.com>
To:        Peter Lai <cowbert@gmail.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: bge wedging 8.2-RC1
Message-ID:  <20110209214130.GB10080@michelle.cdnetworks.com>
In-Reply-To: <AANLkTingnHrD4N9AMQ84UqGX_VTTqQBg-pLqJCvk-x58@mail.gmail.com>
References:  <AANLkTikQuaz1JSuFt=p49HTkqxBm7FaNeTmb9LXJU8Kg@mail.gmail.com> <20110208013841.GB1306@michelle.cdnetworks.com> <AANLkTingnHrD4N9AMQ84UqGX_VTTqQBg-pLqJCvk-x58@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--X1bOJ3K7DJ5YkBrT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Mon, Feb 07, 2011 at 08:27:43PM -0600, Peter Lai wrote:
> On Feb 7, 2011 7:38 PM, "Pyun YongHyeon" <pyunyh@gmail.com> wrote:
> >
> > On Mon, Feb 07, 2011 at 06:09:16PM -0600, Peter Lai wrote:
> > > Hello
> > >
> > > I've got a new Dell Precision workstation here with a BCM5761 on intel
> > > mobo for westmere xeons that is wedging with interrupt storm and will
> > > lockup the system randomly. I have turned HTT and auto powermanagement
> > > off in bios (system cannot sleep), lowest cpu acpi state is C1.
> > >
> > > Here is dmesg:
> > > bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev.
> > > 0x5761100> mem 0xf3be0000-0xf3beffff,0xf3bf0000-0xf3bfffff irq 17 at
> > > device 0.0 on pci6
> > > bge0: CHIP ID 0x05761100; ASIC REV 0x5761; CHIP REV 0x57611; PCI-E
> > > miibus0: <MII bus> on bge0
> > > brgphy0: <BCM5761 10/100/1000baseTX PHY> PHY 1 on miibus0
> > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
> > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
> > >
> > > Here is pciconf -lv:
> > > bge0@pci0:6:0:0:      class=0x020000 card=0x026d1028 chip=0x168114e4
> > > rev=0x10 hdr=0x00
> > >     vendor     = 'Broadcom Corporation'
> > >     device     = 'Broadcom 57XX Gigabit Integrated Controller
>  (BCM5761)'
> > >     class      = network
> > >     subclass   = ethernet
> > >
> > > here is the setup in rc.conf:
> > >
> > > ifconfig_bge0="polling -tso -vlanhwtso -vlanhwtag -vlanmtu inet
> > > 192.168.123.124 netmask 255.255.255.0"
> > >
> > > I have the card plugged into a dlink DSS8 100mbps switch with one
> > > other 100mbps device on it (rich man's crossover cable).
> > >
> > > Before turning off TSO4 and VLAN tagging (because I don't use them),
> > > the card would do several things:
> > > 1. 1 out of 3 reboots: Fail to bring interface up. ifconfig would hang
> > > and systat/vmstat showed 800+ interrupts per second on IRQ256
> >
> > This is strange. bge(4) does not use MSI if you build bge(4) with
> > DEVICE_POLLING so seeing IRQ256 interrupts looks odd to me.
> > Are you sure bge(4) is using IRQ256?
> 
> This is with GENERIC. I will rebuild with POLLING and try...
> 

Let me know attached patch makes any difference on your box.
The patch contains some other changes but that wouldn't affect your
BCM5761 controller. If you see "CLKREQ enabled" message after
applying the patch also let me know that too.

> >
> > > 2. After a few hours lock up the system, requiring hard reboot
> > >
> > > After disabling TSO4 and VLAN stuff:
> > > bge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
> > >       options=80083<RXCSUM,TXCSUM,VLAN_HWCSUM,LINKSTATE>
> > >       media: Ethernet autoselect (100baseTX
> > > <full-duplex,flowcontrol,rxpause,txpause>)
> > >
> > > Everything seemed fine for about two weeks and then suddenly started
> > > acting up again, locked up, after hard reboot, soft reboot, link will
> > > not come up and I see interrupt storm again....
> > >
> >
> > If you don't use DEVICE_POLLING, rebuild bge(4) with
> > DEVICE_POLLING. For most cases, you don't need to enable polling on
> > intelligent controllers like bge(4).
> >
> > I also have BCM5761 PCIe controller which shows no such issues. I
> > know there is an edge case(send BD corruption) for BCM5761/BCM5784/
> > BCM57780 which needs to be investigated. I'm not sure you're seeing
> > that edge case though.
> >
> > > I am close to buying an intel card to replace the bcm, but then I
> > > noticed that the main intel desktop PCI-E card is 82574L-based and
> > > people are having em driver wedging on that too. So now I have broken
> > > ethernet on this box; my primary link is atheros 5212 pci card and I
> > > may be out of pci slots (or else I might try a pci intel card).

--X1bOJ3K7DJ5YkBrT
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="bge.clkreq.diff"

Index: sys/dev/bge/if_bgereg.h
===================================================================
--- sys/dev/bge/if_bgereg.h	(revision 218409)
+++ sys/dev/bge/if_bgereg.h	(working copy)
@@ -2004,6 +2004,11 @@
 #define	BGE_EECTL_DATAOUT		0x00000010
 #define	BGE_EECTL_DATAIN		0x00000020
 
+/* PCIe Link control register */
+#define	BGE_PCIE_LNKCTL			0x7D54
+#define	BGE_PCIE_LNKCTL_L1_PLL_PD_ENB	0x00000008
+#define	BGE_PCIE_LNKCTL_L1_PLL_PD_DIS	0x00000080
+
 /* MDI (MII/GMII) access register */
 #define	BGE_MDI_DATA			0x00000001
 #define	BGE_MDI_DIR			0x00000002
@@ -2769,6 +2774,7 @@
 #define	BGE_FLAG_4G_BNDRY_BUG	0x02000000
 #define	BGE_FLAG_RX_ALIGNBUG	0x04000000
 #define	BGE_FLAG_SHORT_DMA_BUG	0x08000000
+#define	BGE_FLAG_CLKREQ_BUG	0x10000000
 	uint32_t		bge_phy_flags;
 #define	BGE_PHY_WIRESPEED	0x00000001
 #define	BGE_PHY_ADC_BUG		0x00000002
Index: sys/dev/bge/if_bge.c
===================================================================
--- sys/dev/bge/if_bge.c	(revision 218409)
+++ sys/dev/bge/if_bge.c	(working copy)
@@ -879,6 +879,8 @@
 {
 	struct bge_softc *sc;
 	struct mii_data *mii;
+	uint16_t lnkctl;
+
 	sc = device_get_softc(dev);
 	mii = device_get_softc(sc->bge_miibus);
 
@@ -905,6 +907,18 @@
 		sc->bge_link = 0;
 	if (sc->bge_link == 0)
 		return;
+	/* Disable CLKREQ when controller is running at 10/100Mbps. */
+	if (sc->bge_flags & BGE_FLAG_CLKREQ_BUG) {
+		lnkctl = pci_read_config(sc->bge_dev, sc->bge_expcap +
+		    PCIR_EXPRESS_LINK_CTL, 2);
+		if (IFM_SUBTYPE(mii->mii_media_active) == IFM_10_T ||
+		    IFM_SUBTYPE(mii->mii_media_active) == IFM_100_TX)
+			lnkctl &= ~0x0100;
+		else
+			lnkctl |= 0x0100;
+		pci_write_config(sc->bge_dev, sc->bge_expcap +
+		    PCIR_EXPRESS_LINK_CTL, lnkctl, 2);
+	}
 	BGE_CLRBIT(sc, BGE_MAC_MODE, BGE_MACMODE_PORTMODE);
 	if (IFM_SUBTYPE(mii->mii_media_active) == IFM_1000_T ||
 	    IFM_SUBTYPE(mii->mii_media_active) == IFM_1000_SX)
@@ -1383,6 +1397,13 @@
 	    i < BGE_STATUS_BLOCK_END + 1; i += sizeof(uint32_t))
 		BGE_MEMWIN_WRITE(sc, i, 0);
 
+	/* Disable PCIe PLL power down after reset. */
+	if (sc->bge_asicrev == BGE_ASICREV_BCM57780) {
+		val = CSR_READ_4(sc, BGE_PCIE_LNKCTL);
+		val &= ~BGE_PCIE_LNKCTL_L1_PLL_PD_ENB;
+		val |= BGE_PCIE_LNKCTL_L1_PLL_PD_DIS;
+		CSR_WRITE_4(sc, BGE_PCIE_LNKCTL, val);
+	}
 	if (sc->bge_chiprev == BGE_CHIPREV_5704_BX) {
 		/*
 		 *  Fix data corruption caused by non-qword write with WB.
@@ -2911,7 +2932,28 @@
 	 */
 	if (BGE_IS_5714_FAMILY(sc) && (sc->bge_flags & BGE_FLAG_PCIX))
 		sc->bge_flags |= BGE_FLAG_40BIT_BUG;
+
 	/*
+	 * Controllers that are known to corrupt send BDs when CLKREQ is
+	 * enabled.  Diasbling CLKREQ at 10/100Mbps workaround it.
+	 */
+	if (sc->bge_flags & BGE_FLAG_PCIE) {
+		if (sc->bge_asicrev == BGE_ASICREV_BCM5761 ||
+		    sc->bge_asicrev == BGE_ASICREV_BCM5784 ||
+		    sc->bge_chipid == BGE_CHIPID_BCM57780_A0 ||
+		    sc->bge_chipid == BGE_CHIPID_BCM57780_A1) {
+			if (pci_read_config(sc->bge_dev,
+			    sc->bge_expcap + PCIR_EXPRESS_LINK_CTL, 2) &
+			    0x0100) {
+#if 1
+				device_printf(dev, "CLKREQ enabled\n");
+#endif
+				sc->bge_flags |= BGE_FLAG_CLKREQ_BUG;
+			}
+		}
+	}
+
+	/*
 	 * Allocate the interrupt, using MSI if possible.  These devices
 	 * support 8 MSI messages, but only the first one is used in
 	 * normal operation.
@@ -3322,6 +3364,17 @@
 	 */
 	bge_writemem_ind(sc, BGE_SOFTWARE_GENCOMM, BGE_MAGIC_NUMBER);
 
+	/*
+	 * Disable PCIe PLL power down otherwise 57780 may not respond
+	 * to reset.
+	 */
+	if (sc->bge_asicrev == BGE_ASICREV_BCM57780) {
+		val = CSR_READ_4(sc, BGE_PCIE_LNKCTL);
+		val &= ~BGE_PCIE_LNKCTL_L1_PLL_PD_ENB;
+		val |= BGE_PCIE_LNKCTL_L1_PLL_PD_DIS;
+		CSR_WRITE_4(sc, BGE_PCIE_LNKCTL, val);
+	}
+
 	reset = BGE_MISCCFG_RESET_CORE_CLOCKS | BGE_32BITTIME_66MHZ;
 
 	/* XXX: Broadcom Linux driver. */
@@ -3342,9 +3395,6 @@
 	if (BGE_IS_5705_PLUS(sc))
 		reset |= BGE_MISCCFG_GPHY_PD_OVERRIDE;
 
-	/* Issue global reset */
-	write_op(sc, BGE_MISC_CFG, reset);
-
 	if (sc->bge_asicrev == BGE_ASICREV_BCM5906) {
 		val = CSR_READ_4(sc, BGE_VCPU_STATUS);
 		CSR_WRITE_4(sc, BGE_VCPU_STATUS,
@@ -3354,6 +3404,9 @@
 		    val & ~BGE_VCPU_EXT_CTRL_HALT_CPU);
 	}
 
+	/* Issue global reset */
+	write_op(sc, BGE_MISC_CFG, reset);
+
 	DELAY(1000);
 
 	/* XXX: Broadcom Linux driver. */

--X1bOJ3K7DJ5YkBrT--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110209214130.GB10080>