Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Aug 2015 01:18:52 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mark Felder <feld@FreeBSD.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: pkg with an ssh repo crashes CURRENT
Message-ID:  <20150820221852.GX2072@kib.kiev.ua>
In-Reply-To: <1440102370.941813.361650057.269DD227@webmail.messagingengine.com>
References:  <1440014993.2793501.360634953.2FF3B076@webmail.messagingengine.com> <1440021176.3252738.360727753.7FEDAB82@webmail.messagingengine.com> <20150820115041.GU2072@kib.kiev.ua> <1440102370.941813.361650057.269DD227@webmail.messagingengine.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Aug 20, 2015 at 03:26:10PM -0500, Mark Felder wrote:
> 
> 
> On Thu, Aug 20, 2015, at 06:50, Konstantin Belousov wrote:
> > On Wed, Aug 19, 2015 at 04:52:56PM -0500, Mark Felder wrote:
> > > panic: children list
> > > cpuid = 0
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > > 0xfffffe01228ea840
> > > vpanic() at vpanic+0x189/frame 0xfffffe01228ea8c0
> > > kassert_panic() at kassert_panic+0x132/frame 0xfffffe01228ea930
> > > kern_procctl_single() at kern_procctl_single+0x81c/frame
> > > 0xfffffe01228eaa00
> > > kern_procctl() at kern_procctl+0x223/frame 0xfffffe01228eaa50
> > > sys_procctl() at sys_procctl+0xa5/frame 0xfffffe01228eaae0
> > > amd64_syscall() at amd64_syscall+0x282/frame 0xfffffe01228eabf0
> > > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe01228eabf0
> > 
> > The fired assert means that there was a reaper process with some children
> > but without descendands to be reaped.  Hm, I can imagine this situation
> > to happen if e.g. some not-reaper forks and then acquires reaper status.
> > The patch below removes too aggressive asserts.
> > 
> > Still, it would be interesting to look into the process table.  Please
> > repeat the procedure to panic, then in ddb do 'ps'.  After that do
> > 'dump' and please keep kernel.debug and vmcore around.  First I want to
> > look
> > at the ps output.
> 
> I've recreated this in a bhyve VM with the latest CURRENT snapshot,
> r286893. You can grab the whole /var/crash dump at
> https://feld.me/freebsd/crash.tar.gz
vmcore is useless without matching kernel.debug.

> 
> I've pasted the ps output below, but it's also included in the info.0
> file.
And this is not very useful without the preceeding panic message and other
bits from the panic handler.

I guess the process 666 was current when the panic occured ?
Basically, what I want is to see the p_reaper value for the process
with the pid 667.  Even just p_reaper->p_pid is enough.

> 
> Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
> db> ps
>   pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
>   667   666   665     0  S+      select   0xfffff80003c53840 ssh
>   666   665   665     0  R+      CPU 0                       pkg
>   665   629   665     0  S+      wait     0xfffff800039e0548 pkg
>   629   628   629     0  S+      pause    0xfffff8001947eb38 csh
>   628     1   628     0  Ss+     wait     0xfffff80003db8a90 login
>   627     1   627     0  Ss+     ttyin    0xfffff80003c0f0a8 getty
>   626     1   626     0  Ss+     ttyin    0xfffff80003c0f4a8 getty
>   625     1   625     0  Ss+     ttyin    0xfffff8000387a0a8 getty
>   624     1   624     0  Ss+     ttyin    0xfffff8000387a4a8 getty
>   623     1   623     0  Ss+     ttyin    0xfffff8000387a8a8 getty
>   622     1   622     0  Ss+     ttyin    0xfffff8000387aca8 getty
>   621     1   621     0  Ss+     ttyin    0xfffff8000387b0a8 getty
>   620     1   620     0  Ss+     ttyin    0xfffff8000387b4a8 getty
>   577     1   577     0  Ss      nanslp   0xffffffff81ab2561 cron
>   573     1   573    25  Ss      pause    0xfffff80003d040a8 sendmail
>   570     1   570     0  Ss      select   0xfffff80003849c40 sendmail
>   542     1   542     0  Ss      select   0xfffff80003c53ec0 sshd
>   443     1   443     0  Ss      select   0xfffff80003849d40 casperd
>   442     1   442     0  Ss      select   0xfffff80003c540c0 casperd
>   342     1   342     0  Ss      select   0xfffff80003849dc0 syslogd
>   271     1   271     0  Ss      select   0xfffff80003849ec0 devd
>    16     0     0     0  DL      vlruwt   0xfffff800039e0a90 [vnlru]
>    15     0     0     0  DL      syncer   0xffffffff81c41cf8 [syncer]
>    14     0     0     0  DL      (threaded)                  [bufdaemon]
> 100042                   D       psleep   0xffffffff81c40f04 [bufdaemon]
> 100057                   D       sdflush  0xfffff80003d870e8 [/ worker]
>     9     0     0     0  DL      pgzero   0xffffffff81c4aee4 [pagezero]
>     8     0     0     0  DL      psleep   0xffffffff81c4a6b8 [vmdaemon]
>     7     0     0     0  DL      (threaded)                 
>     [pagedaemon]
> 100039                   D       psleep   0xffffffff81cf6684
> [pagedaemon]
> 100045                   D       umarcl   0xffffffff81c4a040 [uma]
>     6     0     0     0  DL      waiting_ 0xffffffff81ce8640
>     [sctp_iterator]
>     5     0     0     0  DL      (threaded)                  [cam]
> 100017                   D       -        0xffffffff818d6e00 [doneq0]
> 100038                   D       -        0xffffffff818d6c48 [scanner]
>     4     0     0     0  DL      crypto_r 0xffffffff81c48b88 [crypto
>     returns]
>     3     0     0     0  DL      crypto_w 0xffffffff81c48a30 [crypto]
>    13     0     0     0  DL      (threaded)                  [geom]
> 100010                   D       -        0xffffffff81cc0aa0 [g_event]
> 100011                   D       -        0xffffffff81cc0aa8 [g_up]
> 100012                   D       -        0xffffffff81cc0ab0 [g_down]
>    12     0     0     0  WL      (threaded)                  [intr]
> 100006                   I                                   [swi4:
> clock (0)]
> 100007                   I                                   [swi4:
> clock (1)]
> 100008                   I                                   [swi3: vm]
> 100009                   I                                   [swi1:
> netisr 0]
> 100018                   I                                   [swi6: task
> queue]
> 100019                   I                                   [swi6:
> Giant taskq]
> 100021                   I                                   [swi5: fast
> taskq]
> 100026                   I                                   [irq264:
> virtio_pci0]
> 100027                   I                                   [irq265:
> virtio_pci0]
> 100028                   I                                   [irq266:
> virtio_pci0]
> 100031                   I                                   [irq267:
> virtio_pci1]
> 100032                   I                                   [irq268:
> virtio_pci1]
> 100033                   I                                   [swi0: uart
> uart]
> 100034                   I                                   [irq1:
> atkbd0]
>    11     0     0     0  RL      (threaded)                  [idle]
> 100004                   CanRun                              [idle:
> cpu0]
> 100005                   Run     CPU 1                       [idle:
> cpu1]
>     2     0     0     0  DL      -        0xffffffff81a03ca0
>     [rand_harvestq]
>     1     0     1     0  SLs     wait     0xfffff8000362f548 [init]
>    10     0     0     0  DL      audit_wo 0xffffffff81cedc10 [audit]
>     0     0     0     0  DLs     (threaded)                  [kernel]
> 100000                   D       swapin   0xffffffff81cc0ad8 [swapper]
> 100013                   D       -        0xfffff80003611300 [firmware
> taskq]
> 100016                   D       -        0xfffff80003610e00 [ffs_trim
> taskq]
> 100020                   D       -        0xfffff80003610400 [thread
> taskq]
> 100022                   D       -        0xfffff80003820100
> [acpi_task_0]
> 100023                   D       -        0xfffff80003820100
> [acpi_task_1]
> 100024                   D       -        0xfffff80003820100
> [acpi_task_2]
> 100025                   D       -        0xfffff8000381fc00 [kqueue
> taskq]
> 100029                   D       -        0xfffff8000381f200 [vtnet0 rxq
> 0]
> 100030                   D       -        0xfffff8000381f100 [vtnet0 txq
> 0]
> 100035                   D       -        0xffffffff81ab1330 [deadlkres]
> 100037                   D       -        0xfffff80003610c00 [CAM taskq]
> 
> 
> 
> 
> > 
> > diff --git a/sys/kern/kern_procctl.c b/sys/kern/kern_procctl.c
> > index d65ba5a..8ef72901 100644
> > --- a/sys/kern/kern_procctl.c
> > +++ b/sys/kern/kern_procctl.c
> > @@ -187,8 +187,6 @@ reap_status(struct thread *td, struct proc *p,
> >  		}
> >  	} else {
> >  		rs->rs_pid = -1;
> > -               KASSERT(LIST_EMPTY(&reap->p_reaplist), ("reap children
> > list"));
> > -               KASSERT(LIST_EMPTY(&reap->p_children), ("children
> > list"));
> >  	}
> >  	return (0);
> >  }
> 
> I'll try compiling a kernel with your patch and see what happens.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150820221852.GX2072>