Date: Tue, 31 Mar 2020 17:57:40 +0200 From: "Kristof Provost" <kp@FreeBSD.org> To: "Mark Johnston" <markj@freebsd.org> Cc: "Li-Wen Hsu" <lwhsu@freebsd.org>, src-committers <src-committers@freebsd.org>, svn-src-all <svn-src-all@freebsd.org>, svn-src-head <svn-src-head@freebsd.org> Subject: Re: svn commit: r359436 - in head/sys: kern net sys Message-ID: <49973196-5F08-4DCE-BA5F-F9B359703A08@FreeBSD.org> In-Reply-To: <9A4C20AA-8E13-47C8-B162-F2304F8C79B7@FreeBSD.org> References: <202003301422.02UEMrxL059978@repo.freebsd.org> <CAKBkRUxrzmqkDrsPXLWr%2B5d6djghR1jbr_Lg5RpvpanAzOxtKw@mail.gmail.com> <20200331015905.GC65028@raichu> <20200331023127.GA97238@raichu> <CAKBkRUyi5At9bwXAH7Sw3xb=KZXTBHOpjpuRKvAMhbxpnSwb2A@mail.gmail.com> <D538EC06-2F66-4638-BD1A-65B27B16C35A@FreeBSD.org> <CAKBkRUwr1L4iO_%2BY4aNupXQCuFsSHtO-Y1L=PQbPKmYMU=tpcA@mail.gmail.com> <20200331151700.GC97238@raichu> <9A4C20AA-8E13-47C8-B162-F2304F8C79B7@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 31 Mar 2020, at 17:28, Kristof Provost wrote: > On 31 Mar 2020, at 17:17, Mark Johnston wrote: >> On Tue, Mar 31, 2020 at 03:51:27PM +0800, Li-Wen Hsu wrote: >>> On Tue, Mar 31, 2020 at 3:00 PM Kristof Provost <kp@freebsd.org> >>> wrote: >>>> >>>> On 31 Mar 2020, at 7:56, Li-Wen Hsu wrote: >>>>> On Tue, Mar 31, 2020 at 10:55 AM Mark Johnston <markj@freebsd.org> >>>>> wrote: >>>>>>>> It seems could be triggered by sys.netinet6.frag6.* >>>>>>>> sys.netpfil.common.* sbin.pfctl.pfctl_test.* tests, and there >>>>>>>> are lots >>>>>>>> of test cases timed out. >>>>>>>> >>>>>>>> Can you help check these? >>>>>>> >>>>>>> I see, it is actually caused by r359438. I'm looking at it now. >>>>>> >>>>>> I verified that the netpfil and netinet6 tests pass with r359477. >>>>> >>>>> Thanks for the fixing, the latest test panics at epair_qflush: >>>>> >>>>> https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14747/consoleFull >>>>> >>>>> while executing sys.netpfil.pf.* tests. I'm not sure if this is >>>>> related or because of previous commits (I suspect the later). I'll >>>>> look into this. >>>>> >>>> That’s a know issue with epair (since EPOCH, I believe). >>>> A number of the pf tests are disabled due to this. See 238870. >>> >>> I also think so, btw, currently every test run panics so I am afraid >>> that the recent commits might make status worse (or say, make the >>> issue easier to reproduce?) >> >> I haven't been able to reproduce any panics or test failures so far. > > Once you disable the ‘atf_skip’ lines in the pf tests a simple > `sudo kldload pfsync && cd /usr/tests/sys/netpfil/pf && sudo kyua > test` is likely sufficient. > The names:names test is a great candidate for this. Remove the `atf_skip …` line in /usr/tests/sys/netpfil/pf/names and run that a few times. It’s not 100% reliable, but the test is very fast and will likely panic every other run or more. Example backtrace: panic: epair_qflush: ifp=0xfffff800079c9000, epair_softc gone? sc=0 cpuid = 1 time = 1585666518 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001bd7e790 vpanic() at vpanic+0x182/frame 0xfffffe001bd7e7e0 panic() at panic+0x43/frame 0xfffffe001bd7e840 epair_qflush() at epair_qflush+0x1a8/frame 0xfffffe001bd7e890 if_down() at if_down+0x12d/frame 0xfffffe001bd7e8c0 if_detach_internal() at if_detach_internal+0x2ee/frame 0xfffffe001bd7e920 if_vmove() at if_vmove+0x3c/frame 0xfffffe001bd7e970 vnet_if_return() at vnet_if_return+0x50/frame 0xfffffe001bd7e990 vnet_destroy() at vnet_destroy+0x130/frame 0xfffffe001bd7e9c0 prison_deref() at prison_deref+0x29d/frame 0xfffffe001bd7ea00 taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe001bd7ea80 taskqueue_thread_loop() at taskqueue_thread_loop+0x94/frame 0xfffffe001bd7eab0 fork_exit() at fork_exit+0x80/frame 0xfffffe001bd7eaf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001bd7eaf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 0 tid 100014 ] Stopped at kdb_enter+0x37: movq $0,0x10927a6(%rip) db> You might see different panics too. The epair teardown flow is complex, and broken. Best regards, Kristof
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49973196-5F08-4DCE-BA5F-F9B359703A08>