Date: Thu, 20 Aug 2015 15:26:10 -0500 From: Mark Felder <feld@FreeBSD.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-current@freebsd.org Subject: Re: pkg with an ssh repo crashes CURRENT Message-ID: <1440102370.941813.361650057.269DD227@webmail.messagingengine.com> In-Reply-To: <20150820115041.GU2072@kib.kiev.ua> References: <1440014993.2793501.360634953.2FF3B076@webmail.messagingengine.com> <1440021176.3252738.360727753.7FEDAB82@webmail.messagingengine.com> <20150820115041.GU2072@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Aug 20, 2015, at 06:50, Konstantin Belousov wrote: > On Wed, Aug 19, 2015 at 04:52:56PM -0500, Mark Felder wrote: > > panic: children list > > cpuid = 0 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > > 0xfffffe01228ea840 > > vpanic() at vpanic+0x189/frame 0xfffffe01228ea8c0 > > kassert_panic() at kassert_panic+0x132/frame 0xfffffe01228ea930 > > kern_procctl_single() at kern_procctl_single+0x81c/frame > > 0xfffffe01228eaa00 > > kern_procctl() at kern_procctl+0x223/frame 0xfffffe01228eaa50 > > sys_procctl() at sys_procctl+0xa5/frame 0xfffffe01228eaae0 > > amd64_syscall() at amd64_syscall+0x282/frame 0xfffffe01228eabf0 > > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe01228eabf0 > > The fired assert means that there was a reaper process with some children > but without descendands to be reaped. Hm, I can imagine this situation > to happen if e.g. some not-reaper forks and then acquires reaper status. > The patch below removes too aggressive asserts. > > Still, it would be interesting to look into the process table. Please > repeat the procedure to panic, then in ddb do 'ps'. After that do > 'dump' and please keep kernel.debug and vmcore around. First I want to > look > at the ps output. I've recreated this in a bhyve VM with the latest CURRENT snapshot, r286893. You can grab the whole /var/crash dump at https://feld.me/freebsd/crash.tar.gz I've pasted the ps output below, but it's also included in the info.0 file. Stopped at kdb_enter+0x3e: movq $0,kdb_why db> ps pid ppid pgrp uid state wmesg wchan cmd 667 666 665 0 S+ select 0xfffff80003c53840 ssh 666 665 665 0 R+ CPU 0 pkg 665 629 665 0 S+ wait 0xfffff800039e0548 pkg 629 628 629 0 S+ pause 0xfffff8001947eb38 csh 628 1 628 0 Ss+ wait 0xfffff80003db8a90 login 627 1 627 0 Ss+ ttyin 0xfffff80003c0f0a8 getty 626 1 626 0 Ss+ ttyin 0xfffff80003c0f4a8 getty 625 1 625 0 Ss+ ttyin 0xfffff8000387a0a8 getty 624 1 624 0 Ss+ ttyin 0xfffff8000387a4a8 getty 623 1 623 0 Ss+ ttyin 0xfffff8000387a8a8 getty 622 1 622 0 Ss+ ttyin 0xfffff8000387aca8 getty 621 1 621 0 Ss+ ttyin 0xfffff8000387b0a8 getty 620 1 620 0 Ss+ ttyin 0xfffff8000387b4a8 getty 577 1 577 0 Ss nanslp 0xffffffff81ab2561 cron 573 1 573 25 Ss pause 0xfffff80003d040a8 sendmail 570 1 570 0 Ss select 0xfffff80003849c40 sendmail 542 1 542 0 Ss select 0xfffff80003c53ec0 sshd 443 1 443 0 Ss select 0xfffff80003849d40 casperd 442 1 442 0 Ss select 0xfffff80003c540c0 casperd 342 1 342 0 Ss select 0xfffff80003849dc0 syslogd 271 1 271 0 Ss select 0xfffff80003849ec0 devd 16 0 0 0 DL vlruwt 0xfffff800039e0a90 [vnlru] 15 0 0 0 DL syncer 0xffffffff81c41cf8 [syncer] 14 0 0 0 DL (threaded) [bufdaemon] 100042 D psleep 0xffffffff81c40f04 [bufdaemon] 100057 D sdflush 0xfffff80003d870e8 [/ worker] 9 0 0 0 DL pgzero 0xffffffff81c4aee4 [pagezero] 8 0 0 0 DL psleep 0xffffffff81c4a6b8 [vmdaemon] 7 0 0 0 DL (threaded) [pagedaemon] 100039 D psleep 0xffffffff81cf6684 [pagedaemon] 100045 D umarcl 0xffffffff81c4a040 [uma] 6 0 0 0 DL waiting_ 0xffffffff81ce8640 [sctp_iterator] 5 0 0 0 DL (threaded) [cam] 100017 D - 0xffffffff818d6e00 [doneq0] 100038 D - 0xffffffff818d6c48 [scanner] 4 0 0 0 DL crypto_r 0xffffffff81c48b88 [crypto returns] 3 0 0 0 DL crypto_w 0xffffffff81c48a30 [crypto] 13 0 0 0 DL (threaded) [geom] 100010 D - 0xffffffff81cc0aa0 [g_event] 100011 D - 0xffffffff81cc0aa8 [g_up] 100012 D - 0xffffffff81cc0ab0 [g_down] 12 0 0 0 WL (threaded) [intr] 100006 I [swi4: clock (0)] 100007 I [swi4: clock (1)] 100008 I [swi3: vm] 100009 I [swi1: netisr 0] 100018 I [swi6: task queue] 100019 I [swi6: Giant taskq] 100021 I [swi5: fast taskq] 100026 I [irq264: virtio_pci0] 100027 I [irq265: virtio_pci0] 100028 I [irq266: virtio_pci0] 100031 I [irq267: virtio_pci1] 100032 I [irq268: virtio_pci1] 100033 I [swi0: uart uart] 100034 I [irq1: atkbd0] 11 0 0 0 RL (threaded) [idle] 100004 CanRun [idle: cpu0] 100005 Run CPU 1 [idle: cpu1] 2 0 0 0 DL - 0xffffffff81a03ca0 [rand_harvestq] 1 0 1 0 SLs wait 0xfffff8000362f548 [init] 10 0 0 0 DL audit_wo 0xffffffff81cedc10 [audit] 0 0 0 0 DLs (threaded) [kernel] 100000 D swapin 0xffffffff81cc0ad8 [swapper] 100013 D - 0xfffff80003611300 [firmware taskq] 100016 D - 0xfffff80003610e00 [ffs_trim taskq] 100020 D - 0xfffff80003610400 [thread taskq] 100022 D - 0xfffff80003820100 [acpi_task_0] 100023 D - 0xfffff80003820100 [acpi_task_1] 100024 D - 0xfffff80003820100 [acpi_task_2] 100025 D - 0xfffff8000381fc00 [kqueue taskq] 100029 D - 0xfffff8000381f200 [vtnet0 rxq 0] 100030 D - 0xfffff8000381f100 [vtnet0 txq 0] 100035 D - 0xffffffff81ab1330 [deadlkres] 100037 D - 0xfffff80003610c00 [CAM taskq] > > diff --git a/sys/kern/kern_procctl.c b/sys/kern/kern_procctl.c > index d65ba5a..8ef72901 100644 > --- a/sys/kern/kern_procctl.c > +++ b/sys/kern/kern_procctl.c > @@ -187,8 +187,6 @@ reap_status(struct thread *td, struct proc *p, > } > } else { > rs->rs_pid = -1; > - KASSERT(LIST_EMPTY(&reap->p_reaplist), ("reap children > list")); > - KASSERT(LIST_EMPTY(&reap->p_children), ("children > list")); > } > return (0); > } I'll try compiling a kernel with your patch and see what happens.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1440102370.941813.361650057.269DD227>