From owner-freebsd-current@freebsd.org Thu Aug 20 22:18:59 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9ED3A9BE68F for ; Thu, 20 Aug 2015 22:18:59 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2C609E0A; Thu, 20 Aug 2015 22:18:59 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id t7KMIqfp097990 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 21 Aug 2015 01:18:52 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua t7KMIqfp097990 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id t7KMIqIj097989; Fri, 21 Aug 2015 01:18:52 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 21 Aug 2015 01:18:52 +0300 From: Konstantin Belousov To: Mark Felder Cc: freebsd-current@freebsd.org Subject: Re: pkg with an ssh repo crashes CURRENT Message-ID: <20150820221852.GX2072@kib.kiev.ua> References: <1440014993.2793501.360634953.2FF3B076@webmail.messagingengine.com> <1440021176.3252738.360727753.7FEDAB82@webmail.messagingengine.com> <20150820115041.GU2072@kib.kiev.ua> <1440102370.941813.361650057.269DD227@webmail.messagingengine.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1440102370.941813.361650057.269DD227@webmail.messagingengine.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Aug 2015 22:18:59 -0000 On Thu, Aug 20, 2015 at 03:26:10PM -0500, Mark Felder wrote: > > > On Thu, Aug 20, 2015, at 06:50, Konstantin Belousov wrote: > > On Wed, Aug 19, 2015 at 04:52:56PM -0500, Mark Felder wrote: > > > panic: children list > > > cpuid = 0 > > > KDB: stack backtrace: > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > > > 0xfffffe01228ea840 > > > vpanic() at vpanic+0x189/frame 0xfffffe01228ea8c0 > > > kassert_panic() at kassert_panic+0x132/frame 0xfffffe01228ea930 > > > kern_procctl_single() at kern_procctl_single+0x81c/frame > > > 0xfffffe01228eaa00 > > > kern_procctl() at kern_procctl+0x223/frame 0xfffffe01228eaa50 > > > sys_procctl() at sys_procctl+0xa5/frame 0xfffffe01228eaae0 > > > amd64_syscall() at amd64_syscall+0x282/frame 0xfffffe01228eabf0 > > > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe01228eabf0 > > > > The fired assert means that there was a reaper process with some children > > but without descendands to be reaped. Hm, I can imagine this situation > > to happen if e.g. some not-reaper forks and then acquires reaper status. > > The patch below removes too aggressive asserts. > > > > Still, it would be interesting to look into the process table. Please > > repeat the procedure to panic, then in ddb do 'ps'. After that do > > 'dump' and please keep kernel.debug and vmcore around. First I want to > > look > > at the ps output. > > I've recreated this in a bhyve VM with the latest CURRENT snapshot, > r286893. You can grab the whole /var/crash dump at > https://feld.me/freebsd/crash.tar.gz vmcore is useless without matching kernel.debug. > > I've pasted the ps output below, but it's also included in the info.0 > file. And this is not very useful without the preceeding panic message and other bits from the panic handler. I guess the process 666 was current when the panic occured ? Basically, what I want is to see the p_reaper value for the process with the pid 667. Even just p_reaper->p_pid is enough. > > Stopped at kdb_enter+0x3e: movq $0,kdb_why > db> ps > pid ppid pgrp uid state wmesg wchan cmd > 667 666 665 0 S+ select 0xfffff80003c53840 ssh > 666 665 665 0 R+ CPU 0 pkg > 665 629 665 0 S+ wait 0xfffff800039e0548 pkg > 629 628 629 0 S+ pause 0xfffff8001947eb38 csh > 628 1 628 0 Ss+ wait 0xfffff80003db8a90 login > 627 1 627 0 Ss+ ttyin 0xfffff80003c0f0a8 getty > 626 1 626 0 Ss+ ttyin 0xfffff80003c0f4a8 getty > 625 1 625 0 Ss+ ttyin 0xfffff8000387a0a8 getty > 624 1 624 0 Ss+ ttyin 0xfffff8000387a4a8 getty > 623 1 623 0 Ss+ ttyin 0xfffff8000387a8a8 getty > 622 1 622 0 Ss+ ttyin 0xfffff8000387aca8 getty > 621 1 621 0 Ss+ ttyin 0xfffff8000387b0a8 getty > 620 1 620 0 Ss+ ttyin 0xfffff8000387b4a8 getty > 577 1 577 0 Ss nanslp 0xffffffff81ab2561 cron > 573 1 573 25 Ss pause 0xfffff80003d040a8 sendmail > 570 1 570 0 Ss select 0xfffff80003849c40 sendmail > 542 1 542 0 Ss select 0xfffff80003c53ec0 sshd > 443 1 443 0 Ss select 0xfffff80003849d40 casperd > 442 1 442 0 Ss select 0xfffff80003c540c0 casperd > 342 1 342 0 Ss select 0xfffff80003849dc0 syslogd > 271 1 271 0 Ss select 0xfffff80003849ec0 devd > 16 0 0 0 DL vlruwt 0xfffff800039e0a90 [vnlru] > 15 0 0 0 DL syncer 0xffffffff81c41cf8 [syncer] > 14 0 0 0 DL (threaded) [bufdaemon] > 100042 D psleep 0xffffffff81c40f04 [bufdaemon] > 100057 D sdflush 0xfffff80003d870e8 [/ worker] > 9 0 0 0 DL pgzero 0xffffffff81c4aee4 [pagezero] > 8 0 0 0 DL psleep 0xffffffff81c4a6b8 [vmdaemon] > 7 0 0 0 DL (threaded) > [pagedaemon] > 100039 D psleep 0xffffffff81cf6684 > [pagedaemon] > 100045 D umarcl 0xffffffff81c4a040 [uma] > 6 0 0 0 DL waiting_ 0xffffffff81ce8640 > [sctp_iterator] > 5 0 0 0 DL (threaded) [cam] > 100017 D - 0xffffffff818d6e00 [doneq0] > 100038 D - 0xffffffff818d6c48 [scanner] > 4 0 0 0 DL crypto_r 0xffffffff81c48b88 [crypto > returns] > 3 0 0 0 DL crypto_w 0xffffffff81c48a30 [crypto] > 13 0 0 0 DL (threaded) [geom] > 100010 D - 0xffffffff81cc0aa0 [g_event] > 100011 D - 0xffffffff81cc0aa8 [g_up] > 100012 D - 0xffffffff81cc0ab0 [g_down] > 12 0 0 0 WL (threaded) [intr] > 100006 I [swi4: > clock (0)] > 100007 I [swi4: > clock (1)] > 100008 I [swi3: vm] > 100009 I [swi1: > netisr 0] > 100018 I [swi6: task > queue] > 100019 I [swi6: > Giant taskq] > 100021 I [swi5: fast > taskq] > 100026 I [irq264: > virtio_pci0] > 100027 I [irq265: > virtio_pci0] > 100028 I [irq266: > virtio_pci0] > 100031 I [irq267: > virtio_pci1] > 100032 I [irq268: > virtio_pci1] > 100033 I [swi0: uart > uart] > 100034 I [irq1: > atkbd0] > 11 0 0 0 RL (threaded) [idle] > 100004 CanRun [idle: > cpu0] > 100005 Run CPU 1 [idle: > cpu1] > 2 0 0 0 DL - 0xffffffff81a03ca0 > [rand_harvestq] > 1 0 1 0 SLs wait 0xfffff8000362f548 [init] > 10 0 0 0 DL audit_wo 0xffffffff81cedc10 [audit] > 0 0 0 0 DLs (threaded) [kernel] > 100000 D swapin 0xffffffff81cc0ad8 [swapper] > 100013 D - 0xfffff80003611300 [firmware > taskq] > 100016 D - 0xfffff80003610e00 [ffs_trim > taskq] > 100020 D - 0xfffff80003610400 [thread > taskq] > 100022 D - 0xfffff80003820100 > [acpi_task_0] > 100023 D - 0xfffff80003820100 > [acpi_task_1] > 100024 D - 0xfffff80003820100 > [acpi_task_2] > 100025 D - 0xfffff8000381fc00 [kqueue > taskq] > 100029 D - 0xfffff8000381f200 [vtnet0 rxq > 0] > 100030 D - 0xfffff8000381f100 [vtnet0 txq > 0] > 100035 D - 0xffffffff81ab1330 [deadlkres] > 100037 D - 0xfffff80003610c00 [CAM taskq] > > > > > > > > diff --git a/sys/kern/kern_procctl.c b/sys/kern/kern_procctl.c > > index d65ba5a..8ef72901 100644 > > --- a/sys/kern/kern_procctl.c > > +++ b/sys/kern/kern_procctl.c > > @@ -187,8 +187,6 @@ reap_status(struct thread *td, struct proc *p, > > } > > } else { > > rs->rs_pid = -1; > > - KASSERT(LIST_EMPTY(&reap->p_reaplist), ("reap children > > list")); > > - KASSERT(LIST_EMPTY(&reap->p_children), ("children > > list")); > > } > > return (0); > > } > > I'll try compiling a kernel with your patch and see what happens.