From owner-freebsd-current@freebsd.org Mon Aug 6 07:21:57 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 676E6107230A for ; Mon, 6 Aug 2018 07:21:57 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F02698C931; Mon, 6 Aug 2018 07:21:56 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2016.home.selasky.org (unknown [62.141.128.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id DB1992604A5; Mon, 6 Aug 2018 09:21:47 +0200 (CEST) Subject: Re: panic after ifioctl/if_clone_destroy To: Matthew Macy , Roman Bogorodskiy Cc: freebsd-current@freebsd.org References: <20180805153556.GA1957@kloomba> From: Hans Petter Selasky Message-ID: Date: Mon, 6 Aug 2018 09:21:26 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Aug 2018 07:21:57 -0000 Hi, I think the problem is the thread pointed to by tdwait exited. I would say it is not allowed to peek into the other records threads, because they may change under the hood and are not protected by the current context. > if (record->er_cpuid != curcpu) { This optimisation is invalid or needs to be revisited: > /* > * If the head of the list is running, we can wait for it > * to remove itself from the list and thus save us the > * overhead of a migration > */ > if ((tdwait = TAILQ_FIRST(&record->er_tdlist)) != NULL && > TD_IS_RUNNING(tdwait->et_td)) { > gen = record->er_gen; > thread_unlock(td); > do { > cpu_spinwait(); > } while (tdwait == TAILQ_FIRST(&record->er_tdlist) && > gen == record->er_gen && TD_IS_RUNNING(tdwait->et_td) && > spincount++ < MAX_ADAPTIVE_SPIN); > thread_lock(td); > return; > } --HPS On 08/05/18 22:01, Matthew Macy wrote: > If you could give me a self-contained reproducer that would expedite a fix. > > Thanks. > -M > > On Sun, Aug 5, 2018 at 08:36 Roman Bogorodskiy wrote: > >> Running -CURRENT r336863 on amd64. Get the following panic right after >> (or during) boot: >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 2; apic id = 04 >> fault virtual address = 0xdeadc2ff >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0xffffffff80bd7858 >> stack pointer = 0x28:0xfffffe008b445580 >> frame pointer = 0x28:0xfffffe008b4455c0 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 903 (libvirtd) >> >> Traceback is: >> >> (kgdb) #0 doadump (textdump=0) at pcpu.h:230 >> #1 0xffffffff8043dc7b in db_dump (dummy=, >> dummy2=, dummy3=, >> dummy4=) at /usr/src/sys/ddb/db_command.c:574 >> #2 0xffffffff8043da49 in db_command (cmd_table=) >> at /usr/src/sys/ddb/db_command.c:481 >> #3 0xffffffff8043d7c4 in db_command_loop () >> at /usr/src/sys/ddb/db_command.c:534 >> #4 0xffffffff804409ef in db_trap (type=, >> code=) at /usr/src/sys/ddb/db_main.c:252 >> #5 0xffffffff80bdd513 in kdb_trap (type=12, code=0, tf=> out>) >> at /usr/src/sys/kern/subr_kdb.c:693 >> #6 0xffffffff810769f1 in trap_fatal (frame=0xfffffe008b4454c0, >> eva=3735929599) >> at /usr/src/sys/amd64/amd64/trap.c:884 >> #7 0xffffffff81076b12 in trap_pfault (frame=0xfffffe008b4454c0, >> usermode=) at pcpu.h:230 >> #8 0xffffffff8107611a in trap (frame=0xfffffe008b4454c0) >> at /usr/src/sys/amd64/amd64/trap.c:427 >> #9 0xffffffff810518ac in calltrap () >> at /usr/src/sys/amd64/amd64/exception.S:230 >> #10 0xffffffff80bd7858 in epoch_block_handler_preempt ( >> global=, cr=0xfffffe00760c3a00, >> arg=) at /usr/src/sys/kern/subr_epoch.c:256 >> #11 0xffffffff803994fd in ck_epoch_synchronize_wait ( >> global=0xfffff800030c5680, >> cb=0xffffffff80bd77a0 , ct=0x0) >> at /usr/src/sys/contrib/ck/src/ck_epoch.c:407 >> #12 0xffffffff80bd7630 in epoch_wait_preempt (epoch=0xfffff800030c5680) >> at /usr/src/sys/kern/subr_epoch.c:389 >> #13 0xffffffff80c983bf in if_delgroup (ifp=0xfffff80003aab800, >> groupname=0xfffff80005ff5e00 "bridge") at /usr/src/sys/net/if.c:1514 >> #14 0xffffffff80c9f2b2 in if_clone_destroyif (ifc=0xfffff80005ff5e00, >> ifp=0xfffff80003aab800) at /usr/src/sys/net/if_clone.c:325 >> #15 0xffffffff80c9f0d5 in if_clone_destroy (name=0xfffffe008b4458d0 >> "virbr0") >> at /usr/src/sys/net/if_clone.c:288 >> #16 0xffffffff80c9a2c3 in ifioctl (so=0xfffff80007edca38, cmd=2149607801, >> data=, td=) >> at /usr/src/sys/net/if.c:3053 >> #17 0xffffffff80c04259 in kern_ioctl (td=0xfffff80007c1a580, >> fd=, com=, >> data=) at file.h:330 >> #18 0xffffffff80c03f2e in sys_ioctl (td=0xfffff80007c1a580, >> uap=0xfffff80007c1a940) at /usr/src/sys/kern/sys_generic.c:712 >> #19 0xffffffff81077401 in amd64_syscall (td=0xfffff80007c1a580, traced=0) >> at subr_syscall.c:135 >> #20 0xffffffff8105218d in fast_syscall_common () >> at /usr/src/sys/amd64/amd64/exception.S:500 >> #21 0x00000008028f4c0a in ?? () >> >> >> Previous frame inner to this frame (corrupt stack?) >> >> >> Current language: auto; currently minimal >> >> >> (kgdb) >> >> It looks like panic happens during network interfaces related >> operations. Couple of dmesg lines before panic: >> >> Aug 5 19:02:42 romashka rtsold[585]: interface >> bridge0 removed >> Aug 5 19:02:42 romashka kernel: bridge0: Ethernet address: >> 02:af:41:48:c7:00 >> Aug 5 19:02:42 romashka kernel: bridge0: changing name to 'virbr-ab' >> Aug 5 19:02:42 romashka kernel: tap0: Ethernet address: 00:bd:8d:11:f7:00 >> Aug 5 19:02:42 romashka kernel: tap0: link state changed to UP >> Aug 5 19:02:42 romashka kernel: tap0: changing name to 'virbr-ab-nic' >> Aug 5 19:02:42 romashka kernel: virbr-ab-nic: promiscuous mode enabled >> Aug 5 19:02:42 romashka kernel: virbr-ab: link state changed to UP >> Aug 5 19:02:42 romashka rtsold[585]: interface >> tap0 removed >> Aug 5 19:02:43 romashka dnsmasq[1047]: setting --bind-interfaces option >> because of OS limitations >> Aug 5 19:02:43 romashka dnsmasq[1047]: warning: no upstream servers >> configured >> Aug 5 19:02:43 romashka kernel: virbr-ab-nic: link state changed to DOWN >> Aug 5 19:02:43 romashka kernel: virbr-ab: link state changed to DOWN >> Aug 5 19:02:43 romashka kernel: bridge1: Ethernet address: >> 02:af:41:48:c7:01 >> Aug 5 19:02:43 romashka kernel: bridge1: changing name to 'virbr0' >> Aug 5 19:02:43 romashka rtsold[585]: interface >> bridge1 removed >> Aug 5 19:02:43 romashka kernel: tap1: Ethernet address: 00:bd:53:14:f7:01 >> Aug 5 19:02:43 romashka kernel: tap1: link state changed to UP >> Aug 5 19:02:43 romashka kernel: tap1: changing name to 'virbr0-nic' >> Aug 5 19:02:43 romashka kernel: virbr0: link state changed to UP >> Aug 5 19:02:43 romashka kernel: virbr0-nic: promiscuous mode enabled >> Aug 5 19:02:43 romashka rtsold[585]: interface >> tap1 removed >> Aug 5 19:05:03 romashka syslogd: kernel boot file is /boot/kernel/kernel >> Aug 5 19:05:03 romashka kernel: >> Aug 5 19:05:03 romashka syslogd: last message repeated 1 times >> Aug 5 19:05:03 romashka kernel: Fatal trap 12: page fault while in kernel >> mode >> >> If I disable libvirt service, system completes booting fine. What it >> tries to do on start, it creates a couple of bridge(4) and tap(4) >> devices, adds tap devices to bridges it created, and possibly destroy >> these interfaces in case of errors. It also starts dnsmasq on some of >> these interfaces. >> >> This problem started to appear about 2-4 weeks ago. >> >> Roman Bogorodskiy >> > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >