Date: Tue, 8 May 2018 14:58:00 -0400 From: Stephen Hurd <shurd@llnw.com> To: Harry Schmalzbauer <freebsd@omnilan.de> Cc: Sean Bruno <sbruno@freebsd.org>, Kevin Bowling <kevin.bowling@kev009.com>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Stephen Hurd <shurd@freebsd.org> Subject: Re: iflib-if_em tests with HEAD and lagg panic [Was: Re: svn commit: r333338 - in stable/11/sys: dev/bnxt kern net sys] Message-ID: <CAGK_Ob1XR_D_B=vXeHtMQwHA2yXhhWPfMtpwHKwDbGfoWgaOVw@mail.gmail.com> In-Reply-To: <5AF1E073.5010701@omnilan.de> References: <201805072142.w47LgN1R041002@repo.freebsd.org> <5AF16B8B.7030703@omnilan.de> <CAK7dMtBkCvLgPVnsf%2BECcrdbKNvOShONeZ=vqvg3dJ5ZeuoP5w@mail.gmail.com> <5AF17134.7020602@omnilan.de> <CAK7dMtB3V1F=2AxtsbUznn5DO81G3Zkh9UYiN3eWkyOfV_CYmg@mail.gmail.com> <5AF1CF0F.4040909@omnilan.de> <65972f0d-2873-42ea-464c-a3db543abafb@freebsd.org> <5AF1E073.5010701@omnilan.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Can you test the review here: https://reviews.freebsd.org/D15355 It looks like there are two different locks protecting the same data everywhere but in lagg_ioctl(). This is a rough first-pass, and there may be some lingering recursion and performance regressions with it. On Tue, May 8, 2018 at 1:37 PM, Harry Schmalzbauer <freebsd@omnilan.de> wrote: > Bez=C3=BCglich Sean Bruno's Nachricht vom 08.05.2018 18:44 (localtime): > > > > > > On 05/08/18 10:23, Harry Schmalzbauer wrote: > >> Bez=C3=BCglich Kevin Bowling's Nachricht vom 08.05.2018 11:52 (localti= me): > >> =E2=80=A6 > >>>> But if the simple iflib/hw-support test with kawela+hartwell helps I= 'm > >>>> happy to do. > >>> > >>> At this point it would be helpful, we think e1000 is nearing pretty > >>> good shape and I need to become familiar with any outstanding bugs. > >> > >> I started with hartwell: > >> em1: attach_pre capping queues at 2 > >> > >> Current cap: 0x460b > >> em1: using 1024 tx descriptors and 1024 rx descriptors > >> em1: msix_init qsets capped at 2 > >> em1: pxm cpus: 2 queue msgs: 4 admincnt: 1 > >> em1: using 2 rx queues 2 tx queues > >> em1: Using MSIX interrupts with 3 vectors > >> em1: allocated for 2 tx_queues > >> em1: allocated for 2 rx_queues > >> em1: Ethernet address: 00:1b:21:3e:90:52 > >> em1: netmap queues/slots: TX 2/1024, RX 2/1024 > >> dev.em.1.iflib.driver_version: 7.6.1-k > >> dev.em.1.queue_rx_1.rx_irq: 0 > >> dev.em.1.queue_rx_1.rxd_tail: 607 > >> dev.em.1.queue_rx_1.rxd_head: 21 > >> dev.em.1.queue_rx_0.rx_irq: 0 > >> dev.em.1.queue_rx_0.rxd_tail: 410 > >> dev.em.1.queue_rx_0.rxd_head: 412 > >> dev.em.1.queue_tx_1.tx_irq: 0 > >> dev.em.1.queue_tx_1.txd_tail: 8 > >> dev.em.1.queue_tx_1.txd_head: 8 > >> dev.em.1.queue_tx_0.tx_irq: 0 > >> dev.em.1.queue_tx_0.txd_tail: 428 > >> dev.em.1.queue_tx_0.txd_head: 428 > >> > >> Looks good so far, no problems with simple line speed (NFS4) copies. > >> > >> According to the i217 (Clarkville) Datasheet, it also supports 2 queue= s: > >> Table 63. Intel=C2=AE Ethernet Controller I217 Capability PHY Address = 01, > >> Page 776,Register 19 > >> But it probably was never supported, at least I haven't ever checked > >> pre-iflib. > >> Here's the clakville: > >> em0: attach_pre capping queues at 1 > >> em0: using 1024 tx descriptors and 1024 rx descriptors > >> em0: msix_init qsets capped at > >> em0: PCIY_MSIX capability not found; or rid 0 =3D=3D 0. > >> em0: Using an MSI interrupt > >> em0: allocated for 1 tx_queues > >> em0: allocated for 1 rx_queues > >> em0: Ethernet address: 54:be:f7:0b:d7:4e > >> em0: netmap queues/slots: TX 1/1024, RX 1/1024 > >> > >> Since it's not not effort here, I also tried LACP, which panicked. > >> vmcore available, but what debugger to use these days? kgdb seems to b= e > >> replaced... > >> > >> -harry > >> _____________ > > > > /usr/libexec/kgdb should be the old kgdb that you are used to. Most of > > us have switched to using devel/gdb from ports. > > Thanks, me stupid =E2=80=93 it's in libexec, not in my path... > Unfortunately I have no clue about those essential C tools, so it > doesn't make much sense for me to waste energy installing devel/gdb ;-) > While I'm wondering why/how LLVM/gdb can be mixed... pure lack of > essentials :-( > > So back to iflib-if_em panic after setting up a if_lagg(4) interface > (which consists of an addon 82574 and the on-board (PCH)+i217 NIC, which > was assigned a locally administrated ethernet address and used as first > laggport, so the private MAC was (successfully) set on both NICs) > and firing dhclient to get a lease: > > > Sleeping on "e1000_delay" with the following non-sleepable locks held: > exclusive rm if_lagg rmlock (if_lagg rmlock) r =3D 0 (0xfffff80014228c08) > locked @ /usr/src/sys/net/if_lagg.c:1433 > stack backtrace: > #0 0xffffffff80701113 at witness_debugger+0x73 > #1 0xffffffff807024f1 at witness_warn+0x461 > #2 0xffffffff806a42cc at _sleep+0x6c > #3 0xffffffff806a4b34 at pause_sbt+0x144 > #4 0xffffffff80440e21 at e1000_write_phy_reg_mdic+0xf1 > #5 0xffffffff804446bf at e1000_enable_phy_wakeup_reg_access_bm+0x2f > #6 0xffffffff80432e0a at e1000_update_mc_addr_list_pch2lan+0x3a > #7 0xffffffff8041408f at em_if_multi_set+0x1bf > #8 0xffffffff807bc02e at iflib_if_ioctl+0xfe > #9 0xffffffff82111a15 at lagg_ioctl+0x115 > #10 0xffffffff807dd348 at inm_release_task+0x218 > #11 0xffffffff806dea29 at gtaskqueue_run_locked+0x139 > #12 0xffffffff806de7a8 at gtaskqueue_thread_loop+0x88 > #13 0xffffffff80659d84 at fork_exit+0x84 > #14 0xffffffff809b767e at fork_trampoline+0xe > Sleeping thread (tid 100017, pid 0) owns a non-sleepable lock > KDB: stack backtrace of thread 100017: > sched_switch() at sched_switch+0x945/frame 0xfffffe00750dc5d0 > mi_switch() at mi_switch+0x18c/frame 0xfffffe00750dc600 > sleepq_switch() at sleepq_switch+0x10d/frame 0xfffffe00750dc640 > sleepq_timedwait() at sleepq_timedwait+0x50/frame 0xfffffe00750dc680 > _sleep() at _sleep+0x307/frame 0xfffffe00750dc730 > pause_sbt() at pause_sbt+0x144/frame 0xfffffe00750dc780 > e1000_write_phy_reg_mdic() at e1000_write_phy_reg_mdic+0xf1/frame > 0xfffffe00750dc7c0 > e1000_enable_phy_wakeup_reg_access_bm() at > e1000_enable_phy_wakeup_reg_access_bm+0x2f/frame 0xfffffe00750dc7e0 > e1000_update_mc_addr_list_pch2lan() at > e1000_update_mc_addr_list_pch2lan+0x3a/frame 0xfffffe00750dc820 > em_if_multi_set() at em_if_multi_set+0x1bf/frame 0xfffffe00750dc870 > iflib_if_ioctl() at iflib_if_ioctl+0xfe/frame 0xfffffe00750dc8e0 > lagg_ioctl() at lagg_ioctl+0x115/frame 0xfffffe00750dc990 > inm_release_task() at inm_release_task+0x218/frame 0xfffffe00750dc9f0 > gtaskqueue_run_locked() at gtaskqueue_run_locked+0x139/frame > 0xfffffe00750dca40 > gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame > 0xfffffe00750dca70 > fork_exit() at fork_exit+0x84/frame 0xfffffe00750dcab0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00750dcab0 > --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- > panic: sleeping thread > cpuid =3D 3 > time =3D 1525794682 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe008fe180e0 > vpanic() at vpanic+0x1a3/frame 0xfffffe008fe18140 > panic() at panic+0x43/frame 0xfffffe008fe181a0 > propagate_priority() at propagate_priority+0x335/frame 0xfffffe008fe181e0 > turnstile_wait() at turnstile_wait+0x38d/frame 0xfffffe008fe18230 > __mtx_lock_sleep() at __mtx_lock_sleep+0x1e1/frame 0xfffffe008fe182b0 > __mtx_lock_flags() at __mtx_lock_flags+0xf9/frame 0xfffffe008fe18300 > _rm_rlock() at _rm_rlock+0x280/frame 0xfffffe008fe18330 > _rm_rlock_debug() at _rm_rlock_debug+0x14c/frame 0xfffffe008fe18380 > lagg_transmit() at lagg_transmit+0x38/frame 0xfffffe008fe183f0 > ether_output_frame() at ether_output_frame+0xaa/frame 0xfffffe008fe18420 > ether_output() at ether_output+0x68b/frame 0xfffffe008fe184c0 > arprequest() at arprequest+0x474/frame 0xfffffe008fe185c0 > arp_ifinit() at arp_ifinit+0x58/frame 0xfffffe008fe18600 > ether_ioctl() at ether_ioctl+0x1d1/frame 0xfffffe008fe18630 > lagg_ioctl() at lagg_ioctl+0x602/frame 0xfffffe008fe186e0 > in_control() at in_control+0x8f5/frame 0xfffffe008fe18780 > ifioctl() at ifioctl+0x19c6/frame 0xfffffe008fe18850 > kern_ioctl() at kern_ioctl+0x2b9/frame 0xfffffe008fe188b0 > sys_ioctl() at sys_ioctl+0x168/frame 0xfffffe008fe18980 > amd64_syscall() at amd64_syscall+0x2cc/frame 0xfffffe008fe18ab0 > fast_syscall_common() at fast_syscall_common+0x101/frame > 0xfffffe008fe18ab0 > --- syscall (54, FreeBSD ELF64, sys_ioctl), rip =3D 0x8004820ba, rsp =3D > 0x7fffffffe1c8, rbp =3D 0x7fffffffe210 --- > KDB: enter: panic > > > Hope this helps, > > -harry > --=20 [image: Limelight Networks] <http://www.limelight.com> Stephen Hurd* Principal Engineer* EXPERIENCE FIRST. +1 616 848 0643 <+1+616+848+0643> www.limelight.com [image: Facebook] <https://www.facebook.com/LimelightNetworks>[image: LinkedIn] <http://www.linkedin.com/company/limelight-networks>[image: Twitter] <https://twitter.com/llnw>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGK_Ob1XR_D_B=vXeHtMQwHA2yXhhWPfMtpwHKwDbGfoWgaOVw>