From owner-freebsd-net@freebsd.org Tue May 8 18:58:02 2018 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A32DBFC655F for ; Tue, 8 May 2018 18:58:02 +0000 (UTC) (envelope-from shurd@llnw.com) Received: from mail-lf0-x231.google.com (mail-lf0-x231.google.com [IPv6:2a00:1450:4010:c07::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F0F1979200 for ; Tue, 8 May 2018 18:58:01 +0000 (UTC) (envelope-from shurd@llnw.com) Received: by mail-lf0-x231.google.com with SMTP id g12-v6so47397778lfb.10 for ; Tue, 08 May 2018 11:58:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=llnw.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=LwmyPZZX8m+PCZ++iHA68IPWt2Leejr7XGnUWgVY0jA=; b=MnY4/9AfkxiRLg6WKR00bj2xQMYo2TXNMkA+h72PyPWGeElAqOmJ5g58QI53VT3iwb uRi2MbSYG283YMrwR/3ZSi1XitH106JBKP8yfYq0EFCXYJg9heN5ZLxuAotK5WJ1psCz usUHm90PmvO4gp0Qnbh/K+ObO4W3g/XMV9IjWWxSVcdxORt/6MMYhp1uiw01gc3J6KEy 7IDDUE2x8aHUTNXRWYKDGtO/19MN3Xw0dm3DHGRbvwEzvL8UBpfpqhpRvH8vPkE+LLzN EvDm/ooX395SK2wIQJXi6OmHGWic41b5Ly6OfUxl9VywGuM+2n9k+uFWkzAp7llewiN/ cx4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=LwmyPZZX8m+PCZ++iHA68IPWt2Leejr7XGnUWgVY0jA=; b=t6z7FIGinlBHLomSZnLl+C35LiNTGSbhfDDprf93wMALLetg+iaYDnjXSJ/BVP+1bA ngjj8ae8FpUI2t3jHUozJuYMo28j7hYBh/jMGzJAl17LFFImjEZ4YWPtUyt0e+qVBQIi xzAyjpNL8S7x/1bSl9pbc8Ou7dAH6hU+jI7yQ86SSKFy7sha35OnGsuZvJcz1S2gj8gH P8ndWfBLSPZvM1YlG9kExPOEHQqs3ZdDiem2s0hlYJp8SE/97KxhZdsawZV0QVZ2uX/5 grncZKD9QX47vN7AKCq5N+CqkYwnQXYZeExhm7bLE//ZjZsm/eBniRZfGJ7xnBSO0+6t bomw== X-Gm-Message-State: ALQs6tDFGW9l4J7dJ8xdeQkKN9xTJ+ZYMEHNahINZcD55Se+vbDSNVQz QH+Zim2qZlOWFCL9ScD0dRrLZS2NjGSW3xaaR1Pevg== X-Google-Smtp-Source: AB8JxZp61ZFiuUcwL5G0CW5FKtc+egVvtOlk4pT1OAZ5niYxeLJ9L689fY1J2JUXQr3HAgLKQqJT9qAHTm2vG/T8H7s= X-Received: by 2002:a19:9106:: with SMTP id t6-v6mr28245994lfd.91.1525805880333; Tue, 08 May 2018 11:58:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.46.145.1 with HTTP; Tue, 8 May 2018 11:58:00 -0700 (PDT) In-Reply-To: <5AF1E073.5010701@omnilan.de> References: <201805072142.w47LgN1R041002@repo.freebsd.org> <5AF16B8B.7030703@omnilan.de> <5AF17134.7020602@omnilan.de> <5AF1CF0F.4040909@omnilan.de> <65972f0d-2873-42ea-464c-a3db543abafb@freebsd.org> <5AF1E073.5010701@omnilan.de> From: Stephen Hurd Date: Tue, 8 May 2018 14:58:00 -0400 Message-ID: Subject: Re: iflib-if_em tests with HEAD and lagg panic [Was: Re: svn commit: r333338 - in stable/11/sys: dev/bnxt kern net sys] To: Harry Schmalzbauer Cc: Sean Bruno , Kevin Bowling , "freebsd-net@freebsd.org" , Stephen Hurd Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2018 18:58:02 -0000 Can you test the review here: https://reviews.freebsd.org/D15355 It looks like there are two different locks protecting the same data everywhere but in lagg_ioctl(). This is a rough first-pass, and there may be some lingering recursion and performance regressions with it. On Tue, May 8, 2018 at 1:37 PM, Harry Schmalzbauer wrote: > Bez=C3=BCglich Sean Bruno's Nachricht vom 08.05.2018 18:44 (localtime): > > > > > > On 05/08/18 10:23, Harry Schmalzbauer wrote: > >> Bez=C3=BCglich Kevin Bowling's Nachricht vom 08.05.2018 11:52 (localti= me): > >> =E2=80=A6 > >>>> But if the simple iflib/hw-support test with kawela+hartwell helps I= 'm > >>>> happy to do. > >>> > >>> At this point it would be helpful, we think e1000 is nearing pretty > >>> good shape and I need to become familiar with any outstanding bugs. > >> > >> I started with hartwell: > >> em1: attach_pre capping queues at 2 > >> > >> Current cap: 0x460b > >> em1: using 1024 tx descriptors and 1024 rx descriptors > >> em1: msix_init qsets capped at 2 > >> em1: pxm cpus: 2 queue msgs: 4 admincnt: 1 > >> em1: using 2 rx queues 2 tx queues > >> em1: Using MSIX interrupts with 3 vectors > >> em1: allocated for 2 tx_queues > >> em1: allocated for 2 rx_queues > >> em1: Ethernet address: 00:1b:21:3e:90:52 > >> em1: netmap queues/slots: TX 2/1024, RX 2/1024 > >> dev.em.1.iflib.driver_version: 7.6.1-k > >> dev.em.1.queue_rx_1.rx_irq: 0 > >> dev.em.1.queue_rx_1.rxd_tail: 607 > >> dev.em.1.queue_rx_1.rxd_head: 21 > >> dev.em.1.queue_rx_0.rx_irq: 0 > >> dev.em.1.queue_rx_0.rxd_tail: 410 > >> dev.em.1.queue_rx_0.rxd_head: 412 > >> dev.em.1.queue_tx_1.tx_irq: 0 > >> dev.em.1.queue_tx_1.txd_tail: 8 > >> dev.em.1.queue_tx_1.txd_head: 8 > >> dev.em.1.queue_tx_0.tx_irq: 0 > >> dev.em.1.queue_tx_0.txd_tail: 428 > >> dev.em.1.queue_tx_0.txd_head: 428 > >> > >> Looks good so far, no problems with simple line speed (NFS4) copies. > >> > >> According to the i217 (Clarkville) Datasheet, it also supports 2 queue= s: > >> Table 63. Intel=C2=AE Ethernet Controller I217 Capability PHY Address = 01, > >> Page 776,Register 19 > >> But it probably was never supported, at least I haven't ever checked > >> pre-iflib. > >> Here's the clakville: > >> em0: attach_pre capping queues at 1 > >> em0: using 1024 tx descriptors and 1024 rx descriptors > >> em0: msix_init qsets capped at > >> em0: PCIY_MSIX capability not found; or rid 0 =3D=3D 0. > >> em0: Using an MSI interrupt > >> em0: allocated for 1 tx_queues > >> em0: allocated for 1 rx_queues > >> em0: Ethernet address: 54:be:f7:0b:d7:4e > >> em0: netmap queues/slots: TX 1/1024, RX 1/1024 > >> > >> Since it's not not effort here, I also tried LACP, which panicked. > >> vmcore available, but what debugger to use these days? kgdb seems to b= e > >> replaced... > >> > >> -harry > >> _____________ > > > > /usr/libexec/kgdb should be the old kgdb that you are used to. Most of > > us have switched to using devel/gdb from ports. > > Thanks, me stupid =E2=80=93 it's in libexec, not in my path... > Unfortunately I have no clue about those essential C tools, so it > doesn't make much sense for me to waste energy installing devel/gdb ;-) > While I'm wondering why/how LLVM/gdb can be mixed... pure lack of > essentials :-( > > So back to iflib-if_em panic after setting up a if_lagg(4) interface > (which consists of an addon 82574 and the on-board (PCH)+i217 NIC, which > was assigned a locally administrated ethernet address and used as first > laggport, so the private MAC was (successfully) set on both NICs) > and firing dhclient to get a lease: > > > Sleeping on "e1000_delay" with the following non-sleepable locks held: > exclusive rm if_lagg rmlock (if_lagg rmlock) r =3D 0 (0xfffff80014228c08) > locked @ /usr/src/sys/net/if_lagg.c:1433 > stack backtrace: > #0 0xffffffff80701113 at witness_debugger+0x73 > #1 0xffffffff807024f1 at witness_warn+0x461 > #2 0xffffffff806a42cc at _sleep+0x6c > #3 0xffffffff806a4b34 at pause_sbt+0x144 > #4 0xffffffff80440e21 at e1000_write_phy_reg_mdic+0xf1 > #5 0xffffffff804446bf at e1000_enable_phy_wakeup_reg_access_bm+0x2f > #6 0xffffffff80432e0a at e1000_update_mc_addr_list_pch2lan+0x3a > #7 0xffffffff8041408f at em_if_multi_set+0x1bf > #8 0xffffffff807bc02e at iflib_if_ioctl+0xfe > #9 0xffffffff82111a15 at lagg_ioctl+0x115 > #10 0xffffffff807dd348 at inm_release_task+0x218 > #11 0xffffffff806dea29 at gtaskqueue_run_locked+0x139 > #12 0xffffffff806de7a8 at gtaskqueue_thread_loop+0x88 > #13 0xffffffff80659d84 at fork_exit+0x84 > #14 0xffffffff809b767e at fork_trampoline+0xe > Sleeping thread (tid 100017, pid 0) owns a non-sleepable lock > KDB: stack backtrace of thread 100017: > sched_switch() at sched_switch+0x945/frame 0xfffffe00750dc5d0 > mi_switch() at mi_switch+0x18c/frame 0xfffffe00750dc600 > sleepq_switch() at sleepq_switch+0x10d/frame 0xfffffe00750dc640 > sleepq_timedwait() at sleepq_timedwait+0x50/frame 0xfffffe00750dc680 > _sleep() at _sleep+0x307/frame 0xfffffe00750dc730 > pause_sbt() at pause_sbt+0x144/frame 0xfffffe00750dc780 > e1000_write_phy_reg_mdic() at e1000_write_phy_reg_mdic+0xf1/frame > 0xfffffe00750dc7c0 > e1000_enable_phy_wakeup_reg_access_bm() at > e1000_enable_phy_wakeup_reg_access_bm+0x2f/frame 0xfffffe00750dc7e0 > e1000_update_mc_addr_list_pch2lan() at > e1000_update_mc_addr_list_pch2lan+0x3a/frame 0xfffffe00750dc820 > em_if_multi_set() at em_if_multi_set+0x1bf/frame 0xfffffe00750dc870 > iflib_if_ioctl() at iflib_if_ioctl+0xfe/frame 0xfffffe00750dc8e0 > lagg_ioctl() at lagg_ioctl+0x115/frame 0xfffffe00750dc990 > inm_release_task() at inm_release_task+0x218/frame 0xfffffe00750dc9f0 > gtaskqueue_run_locked() at gtaskqueue_run_locked+0x139/frame > 0xfffffe00750dca40 > gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame > 0xfffffe00750dca70 > fork_exit() at fork_exit+0x84/frame 0xfffffe00750dcab0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00750dcab0 > --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- > panic: sleeping thread > cpuid =3D 3 > time =3D 1525794682 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe008fe180e0 > vpanic() at vpanic+0x1a3/frame 0xfffffe008fe18140 > panic() at panic+0x43/frame 0xfffffe008fe181a0 > propagate_priority() at propagate_priority+0x335/frame 0xfffffe008fe181e0 > turnstile_wait() at turnstile_wait+0x38d/frame 0xfffffe008fe18230 > __mtx_lock_sleep() at __mtx_lock_sleep+0x1e1/frame 0xfffffe008fe182b0 > __mtx_lock_flags() at __mtx_lock_flags+0xf9/frame 0xfffffe008fe18300 > _rm_rlock() at _rm_rlock+0x280/frame 0xfffffe008fe18330 > _rm_rlock_debug() at _rm_rlock_debug+0x14c/frame 0xfffffe008fe18380 > lagg_transmit() at lagg_transmit+0x38/frame 0xfffffe008fe183f0 > ether_output_frame() at ether_output_frame+0xaa/frame 0xfffffe008fe18420 > ether_output() at ether_output+0x68b/frame 0xfffffe008fe184c0 > arprequest() at arprequest+0x474/frame 0xfffffe008fe185c0 > arp_ifinit() at arp_ifinit+0x58/frame 0xfffffe008fe18600 > ether_ioctl() at ether_ioctl+0x1d1/frame 0xfffffe008fe18630 > lagg_ioctl() at lagg_ioctl+0x602/frame 0xfffffe008fe186e0 > in_control() at in_control+0x8f5/frame 0xfffffe008fe18780 > ifioctl() at ifioctl+0x19c6/frame 0xfffffe008fe18850 > kern_ioctl() at kern_ioctl+0x2b9/frame 0xfffffe008fe188b0 > sys_ioctl() at sys_ioctl+0x168/frame 0xfffffe008fe18980 > amd64_syscall() at amd64_syscall+0x2cc/frame 0xfffffe008fe18ab0 > fast_syscall_common() at fast_syscall_common+0x101/frame > 0xfffffe008fe18ab0 > --- syscall (54, FreeBSD ELF64, sys_ioctl), rip =3D 0x8004820ba, rsp =3D > 0x7fffffffe1c8, rbp =3D 0x7fffffffe210 --- > KDB: enter: panic > > > Hope this helps, > > -harry > --=20 [image: Limelight Networks] Stephen Hurd* Principal Engineer* EXPERIENCE FIRST. +1 616 848 0643 <+1+616+848+0643> www.limelight.com [image: Facebook] [image: LinkedIn] [image: Twitter]