Date: Sun, 11 Apr 2004 16:38:02 +0200 From: Palle Girgensohn <girgen@pingpong.net> To: Ruslan Ermilov <ru@FreeBSD.org>, Bruce Evans <bde@zeta.org.au> Cc: net@FreeBSD.org Subject: Re: sk ethernet driver: watchdog timeout Message-ID: <4670000.1081694282@palle.girgensohn.se> In-Reply-To: <20040408193618.GA1919@ip.net.ua> References: <20240000.1079394807@palle.girgensohn.se> <wpy8q04buf.fsf@heho.snv.jussieu.fr> <3810000.1081299464@palle.girgensohn.se> <20040407235838.K11719@gamplex.bde.org> <20040408193618.GA1919@ip.net.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi again, I just tried FreeBSD RELENG_5_2 on one of our boxes having sk0, and the demsg during boot is very noisy when trying to load driver for sk0... Perhaps these dmesg rows can help. It stops every so often on 5.2 as well, the interface is unusable. :( Is there any point in trying any of the patches from this thread? /Palle skc0: <Marvell Gigabit Ethernet> port 0x9000-0x90ff mem 0xe8000000-0xe8003fff irq 5 at device 4.0 on pci1 skc0: Yukon Gigabit Ethernet 10/100/1000Base-T Adapter sk0: <Marvell Semiconductor, Inc. Yukon> on skc0 malloc() of "512" with the following non-sleepable locks held: exclusive sleep mutex skc0 (network driver) r = 0 (0xc62311c0) locked @ /4/usr/5src/sys/pci/if_sk.c:1368 Stack backtrace: backtrace(c094bc3c,c0c219a4,116,1,c08bf820) at backtrace+0x17 witness_warn(5,0,c087566d,c082122f,55) at witness_warn+0x193 uma_zalloc_arg(c1038e00,0,102,20,c0864b62) at uma_zalloc_arg+0xa9 malloc(110,c08bf820,102,0,180) at malloc+0xcf if_attach(c6232000,c623200c,10,c0864b62,0) at if_attach+0x240 ether_ifattach(c6232000,c62321d4,0,0,ffffffff) at ether_ifattach+0x22 sk_attach(c6231080,c61ba84c,c0886510,c0c21ab0,c062b75f) at sk_attach+0x359 device_probe_and_attach(c6231080,c620b700,c0c21b04,c0a3d797,c620b180) at device_probe_and_attach+0xa9 bus_generic_attach(c620b180,11a,1,c0c21af0,ffffffff) at bus_generic_attach+0x19 skc_attach(c620b180,c61b984c,c0886510,0,e) at skc_attach+0x7d7 device_probe_and_attach(c620b180,1,c0c21b68,c05873f1,c620b700) at device_probe_and_attach+0xa9 bus_generic_attach(c620b700,1,78,c0c21b58,1) at bus_generic_attach+0x19 pci_attach(c620b700,c620b700,c0877468,1,c6161800) at pci_attach+0xa1 device_probe_and_attach(c620b700,c6161800,c0c21bbc,c0589356,c6161800) at device_probe_and_attach+0xa9 bus_generic_attach(c6161800,c0877468,1,c6161800,c0c21be8) at bus_generic_attach+0x19 pcib_attach(c6161800,c61e704c,c0886510,0,c6161a00) at pcib_attach+0x46 device_probe_and_attach(c6161800,0,c0c21c20,c05873f1,c6207780) at device_probe_and_attach+0xa9 bus_generic_attach(c6207780,0,78,c0c21c10,0) at bus_generic_attach+0x19 pci_attach(c6207780,c618684c,c0886510,0,0) at pci_attach+0xa1 device_probe_and_attach(c6207780,0,c0c21c84,c07f6ccd,c6207800) at device_probe_and_attach+0xa9 bus_generic_attach(c6207800,c0877468,0,c0c21c74,0) at bus_generic_attach+0x19 legacy_pcib_attach(c6207800,c61e884c,c0886510,c2262ce0,c6207900) at legacy_pcib_attach+0x9d device_probe_and_attach(c6207800,c6207900,c0c21ce0,c07e169b,c6207900) at device_probe_and_attach+0xa9 bus_generic_attach(c6207900,c0c21ce0,c0654aeb,c08f7228,c6207900) at bus_generic_attach+0x19 legacy_attach(c6207900,c61d284c,c0886510,0,c08faec0) at legacy_attach+0x1b device_probe_and_attach(c6207900,c6207a00,c0c21d2c,c07e98fc,c6207a00) at device_probe_and_attach+0xa9 bus_generic_attach(c6207a00,c6207a00,c0c21d58,c0650489,c6207a00) at bus_generic_attach+0x19 nexus_attach(c6207a00,c61de04c,c0886510,0,c226ea40) at nexus_attach+0x1c device_probe_and_attach(c6207a00,c226ea40,c0c21d7c,c07d7319,c2282a00) at device_probe_and_attach+0xa9 root_bus_configure(c2282a00,c087914c,0,c0c21d98,c060fb49) at root_bus_configure+0x1b configure(0,c1ec00,c1e000,c1ec00,c1e000) at configure+0x29 mi_startup() at mi_startup+0x99 begin() at begin+0x2c sk0: Ethernet address: 00:0e:a6:2b:d5:17 miibus1: <MII bus> on sk0 e1000phy0: <Marvell 88E1000 Gigabit PHY> on miibus1 lock order reversal 1st 0xc62311c0 skc0 (network driver) @ /4/usr/5src/sys/pci/if_sk.c:672 2nd 0xc091aaa0 kernel environment (kernel environment) @ /4/usr/5src/sys/kern/kern_environment.c:288 Stack backtrace: backtrace(c085f834,c091aaa0,c0859523,c0859523,c08594fb) at backtrace+0x17 witness_checkorder(c091aaa0,1,c08594fb,120,c096d000) at witness_checkorder+0x6f6 _sx_slock(c091aaa0,c08594fb,120,c08f6700,a) at _sx_slock+0x8e getenv(c0841e7e,0,c0651084,28,c6230f00) at getenv+0x3b getenv_quad(c0841e7e,c0c21950,c6230f00,c0c2196c,c0c21980) at getenv_quad+0x1a getenv_int(c0841e7e,c09124e8,c6230f00,c0c21980,c0654aeb) at getenv_int+0x18 e1000phy_attach(c6230f00,c61ea84c,c0886510,c06510d1,c086a855) at e1000phy_attach+0x1d device_probe_and_attach(c6230f00,c6231000,c0c219dc,c0561149,c6231000) at device_probe_and_attach+0xa9 bus_generic_attach(c6231000,f0000000,c0a3c610,c0a3c650,c6231000) at bus_generic_attach+0x19 miibus_attach(c6231000,c6231000,2b3,1,0) at miibus_attach+0x59 device_probe_and_attach(c6231000,0,c0c21a38,c056152a,c6231080) at device_probe_and_attach+0xa9 bus_generic_attach(c6231080,0,1,0,c6232000) at bus_generic_attach+0x19 mii_phy_probe(c6231080,c62321e4,c0a3c610,c0a3c650,ffffffff) at mii_phy_probe+0x10a sk_attach(c6231080,c61ba84c,c0886510,c0c21ab0,c062b75f) at sk_attach+0x3a2 device_probe_and_attach(c6231080,c620b700,c0c21b04,c0a3d797,c620b180) at device_probe_and_attach+0xa9 bus_generic_attach(c620b180,11a,1,c0c21af0,ffffffff) at bus_generic_attach+0x19 skc_attach(c620b180,c61b984c,c0886510,0,e) at skc_attach+0x7d7 device_probe_and_attach(c620b180,1,c0c21b68,c05873f1,c620b700) at device_probe_and_attach+0xa9 bus_generic_attach(c620b700,1,78,c0c21b58,1) at bus_generic_attach+0x19 pci_attach(c620b700,c620b700,c0877468,1,c6161800) at pci_attach+0xa1 device_probe_and_attach(c620b700,c6161800,c0c21bbc,c0589356,c6161800) at device_probe_and_attach+0xa9 bus_generic_attach(c6161800,c0877468,1,c6161800,c0c21be8) at bus_generic_attach+0x19 pcib_attach(c6161800,c61e704c,c0886510,0,c6161a00) at pcib_attach+0x46 device_probe_and_attach(c6161800,0,c0c21c20,c05873f1,c6207780) at device_probe_and_attach+0xa9 bus_generic_attach(c6207780,0,78,c0c21c10,0) at bus_generic_attach+0x19 pci_attach(c6207780,c618684c,c0886510,0,0) at pci_attach+0xa1 device_probe_and_attach(c6207780,0,c0c21c84,c07f6ccd,c6207800) at device_probe_and_attach+0xa9 bus_generic_attach(c6207800,c0877468,0,c0c21c74,0) at bus_generic_attach+0x19 legacy_pcib_attach(c6207800,c61e884c,c0886510,c2262ce0,c6207900) at legacy_pcib_attach+0x9d device_probe_and_attach(c6207800,c6207900,c0c21ce0,c07e169b,c6207900) at device_probe_and_attach+0xa9 bus_generic_attach(c6207900,c0c21ce0,c0654aeb,c08f7228,c6207900) at bus_generic_attach+0x19 legacy_attach(c6207900,c61d284c,c0886510,0,c08faec0) at legacy_attach+0x1b device_probe_and_attach(c6207900,c6207a00,c0c21d2c,c07e98fc,c6207a00) at device_probe_and_attach+0xa9 bus_generic_attach(c6207a00,c6207a00,c0c21d58,c0650489,c6207a00) at bus_generic_attach+0x19 nexus_attach(c6207a00,c61de04c,c0886510,0,c226ea40) at nexus_attach+0x1c device_probe_and_attach(c6207a00,c226ea40,c0c21d7c,c07d7319,c2282a00) at device_probe_and_attach+0xa9 bus_generic_attach(c6207a00,c6207a00,c0c21d58,c0650489,c6207a00) at bus_generic_attach+0x19 nexus_attach(c6207a00,c61de04c,c0886510,0,c226ea40) at nexus_attach+0x1c device_probe_and_attach(c6207a00,c226ea40,c0c21d7c,c07d7319,c2282a00) at device_probe_and_attach+0xa9 root_bus_configure(c2282a00,c087914c,0,c0c21d98,c060fb49) at root_bus_configure+0x1b configure(0,c1ec00,c1e000,c1ec00,c1e000) at configure+0x29 mi_startup() at mi_startup+0x99 begin() at begin+0x2c sk0: Ethernet address: 00:0e:a6:2b:d5:17 miibus1: <MII bus> on sk0 e1000phy0: <Marvell 88E1000 Gigabit PHY> on miibus1 lock order reversal 1st 0xc62311c0 skc0 (network driver) @ /4/usr/5src/sys/pci/if_sk.c:672 2nd 0xc091aaa0 kernel environment (kernel environment) @ /4/usr/5src/sys/kern/kern_environment.c:288 Stack backtrace: backtrace(c085f834,c091aaa0,c0859523,c0859523,c08594fb) at backtrace+0x17 witness_checkorder(c091aaa0,1,c08594fb,120,c096d000) at witness_checkorder+0x6f6 _sx_slock(c091aaa0,c08594fb,120,c08f6700,a) at _sx_slock+0x8e getenv(c0841e7e,0,c0651084,28,c6230f00) at getenv+0x3b getenv_quad(c0841e7e,c0c21950,c6230f00,c0c2196c,c0c21980) at getenv_quad+0x1a getenv_int(c0841e7e,c09124e8,c6230f00,c0c21980,c0654aeb) at getenv_int+0x18 e1000phy_attach(c6230f00,c61ea84c,c0886510,c06510d1,c086a855) at e1000phy_attach+0x1d device_probe_and_attach(c6230f00,c6231000,c0c219dc,c0561149,c6231000) at device_probe_and_attach+0xa9 bus_generic_attach(c6231000,f0000000,c0a3c610,c0a3c650,c6231000) at bus_generic_attach+0x19 miibus_attach(c6231000,c6231000,2b3,1,0) at miibus_attach+0x59 device_probe_and_attach(c6231000,0,c0c21a38,c056152a,c6231080) at device_probe_and_attach+0xa9 bus_generic_attach(c6231080,0,1,0,c6232000) at bus_generic_attach+0x19 mii_phy_probe(c6231080,c62321e4,c0a3c610,c0a3c650,ffffffff) at mii_phy_probe+0x10a sk_attach(c6231080,c61ba84c,c0886510,c0c21ab0,c062b75f) at sk_attach+0x3a2 device_probe_and_attach(c6231080,c620b700,c0c21b04,c0a3d797,c620b180) at device_probe_and_attach+0xa9 bus_generic_attach(c620b180,11a,1,c0c21af0,ffffffff) at bus_generic_attach+0x19 skc_attach(c620b180,c61b984c,c0886510,0,e) at skc_attach+0x7d7 device_probe_and_attach(c620b180,1,c0c21b68,c05873f1,c620b700) at device_probe_and_attach+0xa9 bus_generic_attach(c620b700,1,78,c0c21b58,1) at bus_generic_attach+0x19 pci_attach(c620b700,c620b700,c0877468,1,c6161800) at pci_attach+0xa1 device_probe_and_attach(c620b700,c6161800,c0c21bbc,c0589356,c6161800) at device_probe_and_attach+0xa9 bus_generic_attach(c6161800,c0877468,1,c6161800,c0c21be8) at bus_generic_attach+0x19 pcib_attach(c6161800,c61e704c,c0886510,0,c6161a00) at pcib_attach+0x46 device_probe_and_attach(c6161800,0,c0c21c20,c05873f1,c6207780) at device_probe_and_attach+0xa9 bus_generic_attach(c6207780,0,78,c0c21c10,0) at bus_generic_attach+0x19 pci_attach(c6207780,c618684c,c0886510,0,0) at pci_attach+0xa1 device_probe_and_attach(c6207780,0,c0c21c84,c07f6ccd,c6207800) at device_probe_and_attach+0xa9 bus_generic_attach(c6207800,c0877468,0,c0c21c74,0) at bus_generic_attach+0x19 legacy_pcib_attach(c6207800,c61e884c,c0886510,c2262ce0,c6207900) at legacy_pcib_attach+0x9d device_probe_and_attach(c6207800,c6207900,c0c21ce0,c07e169b,c6207900) at device_probe_and_attach+0xa9 bus_generic_attach(c6207900,c0c21ce0,c0654aeb,c08f7228,c6207900) at bus_generic_attach+0x19 legacy_attach(c6207900,c61d284c,c0886510,0,c08faec0) at legacy_attach+0x1b device_probe_and_attach(c6207900,c6207a00,c0c21d2c,c07e98fc,c6207a00) at device_probe_and_attach+0xa9 bus_generic_attach(c6207a00,c6207a00,c0c21d58,c0650489,c6207a00) at bus_generic_attach+0x19 nexus_attach(c6207a00,c61de04c,c0886510,0,c226ea40) at nexus_attach+0x1c device_probe_and_attach(c6207a00,c226ea40,c0c21d7c,c07d7319,c2282a00) at device_probe_and_attach+0xa9 root_bus_configure(c2282a00,c087914c,0,c0c21d98,c060fb49) at root_bus_configure+0x1b configure(0,c1ec00,c1e000,c1ec00,c1e000) at configure+0x29 mi_startup() at mi_startup+0x99 begin() at begin+0x2c e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto skc0: [GIANT-LOCKED] --On torsdag, april 08, 2004 22.36.18 +0300 Ruslan Ermilov <ru@FreeBSD.org> wrote: > On Thu, Apr 08, 2004 at 12:17:06AM +1000, Bruce Evans wrote: > [...] >> The following patch reduces the problem on A7V8X-E a little. It limits >> the tx queue to 1 packet and fixes handling of the timeout on txeof. >> The first part probably makes the second part a no-op. Without this, >> my A7V8X-E hangs on even light nfs activity (e.g., copying a 1MB file >> to nfs). With it, it takes heavier nfs activity to hang (makeworld >> never completes, and a flood ping always hangs). >> >> I first suspected an interrupt-related bug, but the bug seems to be >> more hardware-specific. Examination of the output queues shows that >> the tx sometimes just stops before processing all packets. Resetting >> in sk_watchdog() doesn't always fix the problem, and the timeout usually >> stops firing after a couple of unsuccessful resets, giving a completely >> hung device. But the problem may be related to interrupt timing, since >> it is much smaller under RELENG_4. RELENG_4 hangs about as often >> without this hack as -current does with it. >> >> nv0 hangs similarly. fxp0 just works. >> >> %%% >> Index: if_sk.c >> =================================================================== >> RCS file: /home/ncvs/src/sys/pci/if_sk.c,v >> retrieving revision 1.78 >> diff -u -2 -r1.78 if_sk.c >> --- if_sk.c 31 Mar 2004 12:35:51 -0000 1.78 >> +++ if_sk.c 1 Apr 2004 07:33:58 -0000 >> @@ -1830,4 +1830,9 @@ >> SK_IF_LOCK(sc_if); >> >> + if (sc_if->sk_cdata.sk_tx_cnt > 0) { >> + SK_IF_UNLOCK(sc_if); >> + return; >> + } >> + >> idx = sc_if->sk_cdata.sk_tx_prod; >> >> @@ -1853,4 +1858,5 @@ >> */ >> BPF_MTAP(ifp, m_head); >> + break; >> } >> >> @@ -2000,5 +2031,4 @@ >> sc_if->sk_cdata.sk_tx_cnt--; >> SK_INC(idx, SK_TX_RING_CNT); >> - ifp->if_timer = 0; >> } >> >> @@ -2007,4 +2037,6 @@ >> if (cur_tx != NULL) >> ifp->if_flags &= ~IFF_OACTIVE; >> + >> + ifp->if_timer = (sc_if->sk_cdata.sk_tx_cnt == 0) ? 0 : 5; >> >> return; >> %%% >> > Always recharging the timer to 5 when there's some TX work still > left is a bug. With DEVICE_POLLING (yes, I have plans to add > polling(4) support for sk(4) too), sk_txeof() will be called > periodically, and if the card gets stuck, the if_timer will > never downgrade to zero, and sk_watchdog() will never be called. > Without DEVICE_POLLING, recharging it back to 5 even when > if_timer reaches 0 is still pointless, because when if_timer is > 0 while in the sk_txeof(), it means it's called by sk_watchdog() > which will reinit the card and both RX and TX lists, making them > empty, so having the if_timer with the value of 5 _after_ > executing the watchdog cleaning and having _no_ TX activity at > all may cause a second (false) watchdog. My version of the > TX fixes (which also fixes resetting of IFF_OACTIVE): > > %%% > Index: if_sk.c > =================================================================== > RCS file: /home/ncvs/src/sys/pci/if_sk.c,v > retrieving revision 1.78 > diff -u -p -r1.78 if_sk.c > --- if_sk.c 31 Mar 2004 12:35:51 -0000 1.78 > +++ if_sk.c 8 Apr 2004 19:10:50 -0000 > @@ -1998,14 +1998,14 @@ sk_txeof(sc_if) > sc_if->sk_cdata.sk_tx_chain[idx].sk_mbuf = NULL; > } > sc_if->sk_cdata.sk_tx_cnt--; > + ifp->if_flags &= ~IFF_OACTIVE; > SK_INC(idx, SK_TX_RING_CNT); > - ifp->if_timer = 0; > } > > sc_if->sk_cdata.sk_tx_cons = idx; > > - if (cur_tx != NULL) > - ifp->if_flags &= ~IFF_OACTIVE; > + if (sc_if->sk_cdata.sk_tx_cnt == 0) > + ifp->if_timer = 0; > > return; > } > %%% > > We have been running the 3COM 3C940 card on 4.9 (and from today > on 4.10-BETA) without any problems and under a heavy TX load. > > > Cheers, > -- > Ruslan Ermilov > ru@FreeBSD.org > FreeBSD committer
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4670000.1081694282>