Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Jul 2018 20:02:38 +0200
From:      Marius Strobl <marius@freebsd.org>
To:        Alexander Leidinger <Alexander@leidinger.net>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r336313 - in head/sys: dev/bnxt dev/e1000 dev/ixgbe dev/ixl net sys
Message-ID:  <20180722180238.GY21523@alchemy.franken.de>
In-Reply-To: <20180718223313.Horde.lYE8PRYqLdkrN3QMTTHx3aV@webmail.leidinger.net>
References:  <201807151904.w6FJ4NNg039896@repo.freebsd.org> <20180718223313.Horde.lYE8PRYqLdkrN3QMTTHx3aV@webmail.leidinger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 18, 2018 at 10:33:13PM +0200, Alexander Leidinger wrote:
> Quoting Marius Strobl <marius@freebsd.org> (from Sun, 15 Jul 2018  
> 19:04:23 +0000 (UTC)):
> 
> > Author: marius
> > Date: Sun Jul 15 19:04:23 2018
> > New Revision: 336313
> > URL: https://svnweb.freebsd.org/changeset/base/336313
> >
> > Log:
> >   Assorted TSO fixes for em(4)/iflib(9) and dead code removal:
> [...]
> >   Okayed by:	sbruno@ at 201806 DevSummit Transport Working Group [1]
> >   Reviewed by:	sbruno (earlier version), erj
> >   PR:	219428 (part of; comment #10) [1], 220997 (part of; comment #3)
> 
> Hi Marius,
> 
> thanks a lot for this change, it improves the situation (PR 220997) a  
> lot. The system is running at r336329, as such I don't have your  
> change r336356 yet on the system. Maybe the 2 panics (more below) I've  
> seen are fixed by this. Before I try your second change (surely not  
> before the WE), here at least the report in case it is related to your  
> changes and not related to r336313:
> 
> I got 2 panics, both within 6 minutes (based upon the timestamp of the  
> coredumps in the filesystem):
> 
> 1)
> panic: Assertion ifsd_m[next] == NULL failed at /usr/src/sys/net/iflib.c:3151
> cpuid = 2
> time = 1531944124
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe008af85850
> vpanic() at vpanic+0x1a3/frame 0xfffffe008af858b0
> doadump() at doadump/frame 0xfffffe008af85930
> iflib_txq_drain() at iflib_txq_drain+0xe58/frame 0xfffffe008af85aa0
> ifmp_ring_check_drainage() at ifmp_ring_check_drainage+0x16c/frame  
> 0xfffffe008af85b00
> _task_fn_tx() at _task_fn_tx+0x76/frame 0xfffffe008af85b30
> gtaskqueue_run_locked() at gtaskqueue_run_locked+0x139/frame  
> 0xfffffe008af85b80
> gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame  
> 0xfffffe008af85bb0
> fork_exit() at fork_exit+0x84/frame 0xfffffe008af85bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe008af85bf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> Uptime: 1d22h51m17s
> Dumping 2990 out of 8037 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> __curthread () at ./machine/pcpu.h:230
> 230             __asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) #0  __curthread () at ./machine/pcpu.h:230
> #1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:366
> #2  0xffffffff80485ea1 in kern_reboot (howto=260)
>      at /usr/src/sys/kern/kern_shutdown.c:446
> #3  0xffffffff80486483 in vpanic (fmt=<optimized out>, ap=0xfffffe008af858f0)
>      at /usr/src/sys/kern/kern_shutdown.c:863
> #4  0xffffffff804861f0 in kassert_panic (
>      fmt=0xffffffff807e085f "Assertion %s failed at %s:%d")
>      at /usr/src/sys/kern/kern_shutdown.c:749
> #5  0xffffffff8059cd78 in iflib_busdma_load_mbuf_sg (flags=0,
>      txq=<optimized out>, tag=<optimized out>, map=<optimized out>,
>      m0=<optimized out>, segs=<optimized out>, nsegs=<optimized out>,
>      max_segs=<optimized out>) at /usr/src/sys/net/iflib.c:3151
> #6  iflib_encap (txq=0xfffff800028dc000, m_headp=0xfffffe00959bdd30)
>      at /usr/src/sys/net/iflib.c:3321
> #7  iflib_txq_drain (r=0xfffffe00959ba000, cidx=<optimized out>,
>      pidx=41319936) at /usr/src/sys/net/iflib.c:3636
> #8  0xffffffff805a0f4c in drain_ring_lockless (r=<optimized out>, os=...,
>      prev=<optimized out>, budget=<optimized out>)
>      at /usr/src/sys/net/mp_ring.c:199
> #9  ifmp_ring_check_drainage (r=<optimized out>, budget=32)
>      at /usr/src/sys/net/mp_ring.c:502
> #10 0xffffffff80599c46 in _task_fn_tx (context=<optimized out>)
>      at /usr/src/sys/net/iflib.c:3747
> #11 0xffffffff804cd2c9 in gtaskqueue_run_locked (queue=0xfffff800025e0d00)
>      at /usr/src/sys/kern/subr_gtaskqueue.c:332
> #12 0xffffffff804cd048 in gtaskqueue_thread_loop (arg=<optimized out>)
>      at /usr/src/sys/kern/subr_gtaskqueue.c:507
> #13 0xffffffff8044cc34 in fork_exit (
>      callout=0xffffffff804ccfc0 <gtaskqueue_thread_loop>,
>      arg=0xfffffe0007ffd038, frame=0xfffffe008af85c00)
>      at /usr/src/sys/kern/kern_fork.c:1057
> (kgdb) up 5
> #5  0xffffffff8059cd78 in iflib_busdma_load_mbuf_sg (flags=0,  
> txq=<optimized out>, tag=<optimized out>,
>      map=<optimized out>, m0=<optimized out>, segs=<optimized out>,  
> nsegs=<optimized out>, max_segs=<optimized out>)
>      at /usr/src/sys/net/iflib.c:3151
> 3151                            MPASS(ifsd_m[next] == NULL);
> (kgdb) list
> 3146                            /*
> 3147                             * see if we can't be smarter about physically
> 3148                             * contiguous mappings
> 3149                             */
> 3150                            next = (pidx + count) & (ntxd-1);
> 3151                            MPASS(ifsd_m[next] == NULL);
> 3152    #if MEMORY_LOGGING
> 3153                            txq->ift_enqueued++;
> 3154    #endif
> 3155                            ifsd_m[next] = m;
> (kgdb) print ifsd_m
> $1 = (struct mbuf **) 0xfffffe00959b8000
> (kgdb) print next
> $2 = <optimized out>
> (kgdb) print pidx
> $3 = 277
> (kgdb) print count
> $4 = 0
> (kgdb) print ntxd
> $5 = <optimized out>
> 
> 
> 2)
> Unread portion of the kernel message buffer:
> panic: Assertion ifsd_m[next] == NULL failed at /usr/src/sys/net/iflib.c:3151
> cpuid = 2
> time = 1531944550
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe008af85850
> vpanic() at vpanic+0x1a3/frame 0xfffffe008af858b0
> doadump() at doadump/frame 0xfffffe008af85930
> iflib_txq_drain() at iflib_txq_drain+0xe58/frame 0xfffffe008af85aa0
> ifmp_ring_check_drainage() at ifmp_ring_check_drainage+0x16c/frame  
> 0xfffffe008af85b00
> _task_fn_tx() at _task_fn_tx+0x76/frame 0xfffffe008af85b30
> gtaskqueue_run_locked() at gtaskqueue_run_locked+0x139/frame  
> 0xfffffe008af85b80
> gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame  
> 0xfffffe008af85bb0
> fork_exit() at fork_exit+0x84/frame 0xfffffe008af85bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe008af85bf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> Uptime: 5m27s
> Dumping 1555 out of 8037 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> __curthread () at ./machine/pcpu.h:230
> 230             __asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) bt
> #0  __curthread () at ./machine/pcpu.h:230
> #1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:366
> #2  0xffffffff80485ea1 in kern_reboot (howto=260) at  
> /usr/src/sys/kern/kern_shutdown.c:446
> #3  0xffffffff80486483 in vpanic (fmt=<optimized out>, ap=0xfffffe008af858f0)
>      at /usr/src/sys/kern/kern_shutdown.c:863
> #4  0xffffffff804861f0 in kassert_panic (fmt=0xffffffff807e085f  
> "Assertion %s failed at %s:%d")
>      at /usr/src/sys/kern/kern_shutdown.c:749
> #5  0xffffffff8059cd78 in iflib_busdma_load_mbuf_sg (flags=0,  
> txq=<optimized out>, tag=<optimized out>,
>      map=<optimized out>, m0=<optimized out>, segs=<optimized out>,  
> nsegs=<optimized out>, max_segs=<optimized out>)
>      at /usr/src/sys/net/iflib.c:3151
> #6  iflib_encap (txq=0xfffff800028fe000, m_headp=0xfffffe00959bdde8)  
> at /usr/src/sys/net/iflib.c:3321
> #7  iflib_txq_drain (r=0xfffffe00959ba000, cidx=<optimized out>,  
> pidx=42948608) at /usr/src/sys/net/iflib.c:3636
> #8  0xffffffff805a0f4c in drain_ring_lockless (r=<optimized out>,  
> os=..., prev=<optimized out>,
>      budget=<optimized out>) at /usr/src/sys/net/mp_ring.c:199
> #9  ifmp_ring_check_drainage (r=<optimized out>, budget=32) at  
> /usr/src/sys/net/mp_ring.c:502
> #10 0xffffffff80599c46 in _task_fn_tx (context=<optimized out>) at  
> /usr/src/sys/net/iflib.c:3747
> #11 0xffffffff804cd2c9 in gtaskqueue_run_locked (queue=0xfffff800025a2200)
>      at /usr/src/sys/kern/subr_gtaskqueue.c:332
> #12 0xffffffff804cd048 in gtaskqueue_thread_loop (arg=<optimized out>)  
> at /usr/src/sys/kern/subr_gtaskqueue.c:507
> #13 0xffffffff8044cc34 in fork_exit (callout=0xffffffff804ccfc0  
> <gtaskqueue_thread_loop>, arg=0xfffffe0007ffd038,
>      frame=0xfffffe008af85c00) at /usr/src/sys/kern/kern_fork.c:1057
> #14 <signal handler called>
> (kgdb) up 5
> #5  0xffffffff8059cd78 in iflib_busdma_load_mbuf_sg (flags=0,  
> txq=<optimized out>, tag=<optimized out>,
>      map=<optimized out>, m0=<optimized out>, segs=<optimized out>,  
> nsegs=<optimized out>, max_segs=<optimized out>)
>      at /usr/src/sys/net/iflib.c:3151
> 3151                            MPASS(ifsd_m[next] == NULL);
> (kgdb) print ifsd_m
> $1 = (struct mbuf **) 0xfffffe00959b8000
> (kgdb) print pidx
> $2 = 707
> (kgdb) print count
> $3 = 0

Hrm, so far I neither see how iflib(9) could get into that state nor
did I succeed in reproducing the panic, including not with a LEM-class
MAC. Is that an old or a new problem? If the latter, please try with
r336612. The fix in r336356 is only relevant for IGB-class devices so
doesn't apply to your machine unless the above panics are from gear
different than what PR 220997 is about.

Marius





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180722180238.GY21523>