Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 May 2018 14:58:00 -0400
From:      Stephen Hurd <shurd@llnw.com>
To:        Harry Schmalzbauer <freebsd@omnilan.de>
Cc:        Sean Bruno <sbruno@freebsd.org>, Kevin Bowling <kevin.bowling@kev009.com>,  "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Stephen Hurd <shurd@freebsd.org>
Subject:   Re: iflib-if_em tests with HEAD and lagg panic [Was: Re: svn commit: r333338 - in stable/11/sys: dev/bnxt kern net sys]
Message-ID:  <CAGK_Ob1XR_D_B=vXeHtMQwHA2yXhhWPfMtpwHKwDbGfoWgaOVw@mail.gmail.com>
In-Reply-To: <5AF1E073.5010701@omnilan.de>
References:  <201805072142.w47LgN1R041002@repo.freebsd.org> <5AF16B8B.7030703@omnilan.de> <CAK7dMtBkCvLgPVnsf%2BECcrdbKNvOShONeZ=vqvg3dJ5ZeuoP5w@mail.gmail.com> <5AF17134.7020602@omnilan.de> <CAK7dMtB3V1F=2AxtsbUznn5DO81G3Zkh9UYiN3eWkyOfV_CYmg@mail.gmail.com> <5AF1CF0F.4040909@omnilan.de> <65972f0d-2873-42ea-464c-a3db543abafb@freebsd.org> <5AF1E073.5010701@omnilan.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Can you test the review here: https://reviews.freebsd.org/D15355

It looks like there are two different locks protecting the same data
everywhere but in lagg_ioctl().  This is a rough first-pass, and there may
be some lingering recursion and performance regressions with it.

On Tue, May 8, 2018 at 1:37 PM, Harry Schmalzbauer <freebsd@omnilan.de>
wrote:

> Bez=C3=BCglich Sean Bruno's Nachricht vom 08.05.2018 18:44 (localtime):
> >
> >
> > On 05/08/18 10:23, Harry Schmalzbauer wrote:
> >> Bez=C3=BCglich Kevin Bowling's Nachricht vom 08.05.2018 11:52 (localti=
me):
> >> =E2=80=A6
> >>>> But if the simple iflib/hw-support test with kawela+hartwell helps I=
'm
> >>>> happy to do.
> >>>
> >>> At this point it would be helpful, we think e1000 is nearing pretty
> >>> good shape and I need to become familiar with any outstanding bugs.
> >>
> >> I started with hartwell:
> >> em1: attach_pre capping queues at 2
> >>
> >> Current cap: 0x460b
> >> em1: using 1024 tx descriptors and 1024 rx descriptors
> >> em1: msix_init qsets capped at 2
> >> em1: pxm cpus: 2 queue msgs: 4 admincnt: 1
> >> em1: using 2 rx queues 2 tx queues
> >> em1: Using MSIX interrupts with 3 vectors
> >> em1: allocated for 2 tx_queues
> >> em1: allocated for 2 rx_queues
> >> em1: Ethernet address: 00:1b:21:3e:90:52
> >> em1: netmap queues/slots: TX 2/1024, RX 2/1024
> >> dev.em.1.iflib.driver_version: 7.6.1-k
> >> dev.em.1.queue_rx_1.rx_irq: 0
> >> dev.em.1.queue_rx_1.rxd_tail: 607
> >> dev.em.1.queue_rx_1.rxd_head: 21
> >> dev.em.1.queue_rx_0.rx_irq: 0
> >> dev.em.1.queue_rx_0.rxd_tail: 410
> >> dev.em.1.queue_rx_0.rxd_head: 412
> >> dev.em.1.queue_tx_1.tx_irq: 0
> >> dev.em.1.queue_tx_1.txd_tail: 8
> >> dev.em.1.queue_tx_1.txd_head: 8
> >> dev.em.1.queue_tx_0.tx_irq: 0
> >> dev.em.1.queue_tx_0.txd_tail: 428
> >> dev.em.1.queue_tx_0.txd_head: 428
> >>
> >> Looks good so far, no problems with simple line speed (NFS4) copies.
> >>
> >> According to the i217 (Clarkville) Datasheet, it also supports 2 queue=
s:
> >> Table 63. Intel=C2=AE Ethernet Controller I217 Capability PHY Address =
01,
> >>           Page 776,Register 19
> >> But it probably was never supported, at least I haven't ever checked
> >> pre-iflib.
> >> Here's the clakville:
> >> em0: attach_pre capping queues at 1
> >> em0: using 1024 tx descriptors and 1024 rx descriptors
> >> em0: msix_init qsets capped at
> >> em0: PCIY_MSIX capability not found; or rid 0 =3D=3D 0.
> >> em0: Using an MSI interrupt
> >> em0: allocated for 1 tx_queues
> >> em0: allocated for 1 rx_queues
> >> em0: Ethernet address: 54:be:f7:0b:d7:4e
> >> em0: netmap queues/slots: TX 1/1024, RX 1/1024
> >>
> >> Since it's not not effort here, I also tried LACP, which panicked.
> >> vmcore available, but what debugger to use these days? kgdb seems to b=
e
> >> replaced...
> >>
> >> -harry
> >> _____________
> >
> > /usr/libexec/kgdb should be the old kgdb that you are used to.  Most of
> > us have switched to using devel/gdb from ports.
>
> Thanks, me stupid =E2=80=93 it's in libexec, not in my path...
> Unfortunately I have no clue about those essential C tools, so it
> doesn't make much sense for me to waste energy installing devel/gdb ;-)
> While I'm wondering why/how LLVM/gdb can be mixed... pure lack of
> essentials :-(
>
> So back to iflib-if_em panic after setting up a if_lagg(4) interface
> (which consists of an addon 82574 and the on-board (PCH)+i217 NIC, which
> was assigned a locally administrated ethernet address and used as first
> laggport, so the private MAC was (successfully) set on both NICs)
> and firing dhclient to get a lease:
>
>
> Sleeping on "e1000_delay" with the following non-sleepable locks held:
> exclusive rm if_lagg rmlock (if_lagg rmlock) r =3D 0 (0xfffff80014228c08)
> locked @ /usr/src/sys/net/if_lagg.c:1433
> stack backtrace:
> #0 0xffffffff80701113 at witness_debugger+0x73
> #1 0xffffffff807024f1 at witness_warn+0x461
> #2 0xffffffff806a42cc at _sleep+0x6c
> #3 0xffffffff806a4b34 at pause_sbt+0x144
> #4 0xffffffff80440e21 at e1000_write_phy_reg_mdic+0xf1
> #5 0xffffffff804446bf at e1000_enable_phy_wakeup_reg_access_bm+0x2f
> #6 0xffffffff80432e0a at e1000_update_mc_addr_list_pch2lan+0x3a
> #7 0xffffffff8041408f at em_if_multi_set+0x1bf
> #8 0xffffffff807bc02e at iflib_if_ioctl+0xfe
> #9 0xffffffff82111a15 at lagg_ioctl+0x115
> #10 0xffffffff807dd348 at inm_release_task+0x218
> #11 0xffffffff806dea29 at gtaskqueue_run_locked+0x139
> #12 0xffffffff806de7a8 at gtaskqueue_thread_loop+0x88
> #13 0xffffffff80659d84 at fork_exit+0x84
> #14 0xffffffff809b767e at fork_trampoline+0xe
> Sleeping thread (tid 100017, pid 0) owns a non-sleepable lock
> KDB: stack backtrace of thread 100017:
> sched_switch() at sched_switch+0x945/frame 0xfffffe00750dc5d0
> mi_switch() at mi_switch+0x18c/frame 0xfffffe00750dc600
> sleepq_switch() at sleepq_switch+0x10d/frame 0xfffffe00750dc640
> sleepq_timedwait() at sleepq_timedwait+0x50/frame 0xfffffe00750dc680
> _sleep() at _sleep+0x307/frame 0xfffffe00750dc730
> pause_sbt() at pause_sbt+0x144/frame 0xfffffe00750dc780
> e1000_write_phy_reg_mdic() at e1000_write_phy_reg_mdic+0xf1/frame
> 0xfffffe00750dc7c0
> e1000_enable_phy_wakeup_reg_access_bm() at
> e1000_enable_phy_wakeup_reg_access_bm+0x2f/frame 0xfffffe00750dc7e0
> e1000_update_mc_addr_list_pch2lan() at
> e1000_update_mc_addr_list_pch2lan+0x3a/frame 0xfffffe00750dc820
> em_if_multi_set() at em_if_multi_set+0x1bf/frame 0xfffffe00750dc870
> iflib_if_ioctl() at iflib_if_ioctl+0xfe/frame 0xfffffe00750dc8e0
> lagg_ioctl() at lagg_ioctl+0x115/frame 0xfffffe00750dc990
> inm_release_task() at inm_release_task+0x218/frame 0xfffffe00750dc9f0
> gtaskqueue_run_locked() at gtaskqueue_run_locked+0x139/frame
> 0xfffffe00750dca40
> gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame
> 0xfffffe00750dca70
> fork_exit() at fork_exit+0x84/frame 0xfffffe00750dcab0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00750dcab0
> --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---
> panic: sleeping thread
> cpuid =3D 3
> time =3D 1525794682
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe008fe180e0
> vpanic() at vpanic+0x1a3/frame 0xfffffe008fe18140
> panic() at panic+0x43/frame 0xfffffe008fe181a0
> propagate_priority() at propagate_priority+0x335/frame 0xfffffe008fe181e0
> turnstile_wait() at turnstile_wait+0x38d/frame 0xfffffe008fe18230
> __mtx_lock_sleep() at __mtx_lock_sleep+0x1e1/frame 0xfffffe008fe182b0
> __mtx_lock_flags() at __mtx_lock_flags+0xf9/frame 0xfffffe008fe18300
> _rm_rlock() at _rm_rlock+0x280/frame 0xfffffe008fe18330
> _rm_rlock_debug() at _rm_rlock_debug+0x14c/frame 0xfffffe008fe18380
> lagg_transmit() at lagg_transmit+0x38/frame 0xfffffe008fe183f0
> ether_output_frame() at ether_output_frame+0xaa/frame 0xfffffe008fe18420
> ether_output() at ether_output+0x68b/frame 0xfffffe008fe184c0
> arprequest() at arprequest+0x474/frame 0xfffffe008fe185c0
> arp_ifinit() at arp_ifinit+0x58/frame 0xfffffe008fe18600
> ether_ioctl() at ether_ioctl+0x1d1/frame 0xfffffe008fe18630
> lagg_ioctl() at lagg_ioctl+0x602/frame 0xfffffe008fe186e0
> in_control() at in_control+0x8f5/frame 0xfffffe008fe18780
> ifioctl() at ifioctl+0x19c6/frame 0xfffffe008fe18850
> kern_ioctl() at kern_ioctl+0x2b9/frame 0xfffffe008fe188b0
> sys_ioctl() at sys_ioctl+0x168/frame 0xfffffe008fe18980
> amd64_syscall() at amd64_syscall+0x2cc/frame 0xfffffe008fe18ab0
> fast_syscall_common() at fast_syscall_common+0x101/frame
> 0xfffffe008fe18ab0
> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip =3D 0x8004820ba, rsp =3D
> 0x7fffffffe1c8, rbp =3D 0x7fffffffe210 ---
> KDB: enter: panic
>
>
> Hope this helps,
>
> -harry
>



--=20
[image: Limelight Networks] <http://www.limelight.com>;
Stephen Hurd* Principal Engineer*
EXPERIENCE FIRST.
+1 616 848 0643 <+1+616+848+0643>
www.limelight.com
[image: Facebook] <https://www.facebook.com/LimelightNetworks>[image:
LinkedIn] <http://www.linkedin.com/company/limelight-networks>[image:
Twitter] <https://twitter.com/llnw>;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGK_Ob1XR_D_B=vXeHtMQwHA2yXhhWPfMtpwHKwDbGfoWgaOVw>