From owner-freebsd-current@freebsd.org Thu Aug 27 07:16:03 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 46CAC9C43BD for ; Thu, 27 Aug 2015 07:16:03 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 0D4ACFDB; Thu, 27 Aug 2015 07:16:03 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id t7R7Fp5S004143; Thu, 27 Aug 2015 00:15:55 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201508270715.t7R7Fp5S004143@gw.catspoiler.org> Date: Thu, 27 Aug 2015 00:15:51 -0700 (PDT) From: Don Lewis Subject: Re: Instant panic while trying run ports-mgmt/poudriere To: lstewart@room52.net cc: jmg@funkthat.com, avg@FreeBSD.org, freebsd-current@FreeBSD.org, pawel@FreeBSD.org, kmacy@FreeBSD.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Aug 2015 07:16:03 -0000 On 27 Aug, Don Lewis wrote: > On 27 Aug, Lawrence Stewart wrote: >> On 08/27/15 09:36, John-Mark Gurney wrote: >>> Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300: >>>> On 12/08/2015 17:11, Lawrence Stewart wrote: >>>>> On 08/07/15 07:33, Pawel Pekala wrote: >>>>>> Hi K., >>>>>> >>>>>> On 2015-08-06 12:33 -0700, "K. Macy" wrote: >>>>>>> Is this still happening? >>>>>> >>>>>> Still crashes: >>>>> >>>>> +1 for me running r286617 >>>> >>>> Here is another +1 with r286922. >>>> I can add a couple of bits of debugging data: >>>> >>>> (kgdb) fr 8 >>>> #8 0xffffffff80639d60 in knote (list=0xfffff8019a733ea0, >>>> hint=2147483648, lockflags=) at >>>> /usr/src/sys/kern/kern_event.c:1964 >>>> 1964 } else if ((lockflags & KNF_NOKQLOCK) != 0) { >>>> (kgdb) p *list >>>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff8063a1e0 >>> >>> We should/cannot get here w/ an empty list. If we do, then there is >>> something seriously wrong... The current kn (which we must have as we >>> are here) MUST be on the list, but as you just showed, there are no >>> knotes on the list. >>> >>> Can you get me a print of the knote? That way I can see what flags >>> are on it? >> >> I quickly tried to get this info for you by building my kernel with -O0 >> and reproducing, but I get an insta-panic on boot with the new kernel: >> >> Fatal double fault >> rip = 0xffffffff8218c794 >> rsp = 0xfffffe044cdc9fe0 >> rbp = 0xfffffe044cdca110 >> cpuid = 2; apic id = 02 >> panic: double fault >> cpuid = 2 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfffffe03dcfffe30 >> vpanic() at vpanic+0x189/frame 0xfffffe03dcfffeb0 >> panic() at panic+0x43/frame 0xfffffe03dcffff10 >> dblfault_handler() at dblfault_handler+0xa2/frame 0xfffffe03dcffff30 >> Xdblfault() at Xdblfault+0xac/frame 0xfffffe03dcffff30 >> --- trap 0x17, rip = 0xffffffff8218c794, rsp = 0xfffffe044cdc9fe0, rbp = >> 0xfffffe044cdca110 --- >> vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfffffe044cdca110 >> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame >> 0xfffffe044cdca560 >> vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfffffe044cdca5b0 >> zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfffffe044cdca6e0 >> zio_execute() at zio_execute+0x23b/frame 0xfffffe044cdca730 >> zio_nowait() at zio_nowait+0xbe/frame 0xfffffe044cdca760 >> vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame >> 0xfffffe044cdca800 >> zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfffffe044cdca930 >> zio_execute() at zio_execute+0x23b/frame 0xfffffe044cdca980 >> zio_nowait() at zio_nowait+0xbe/frame 0xfffffe044cdca9b0 >> spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfffffe044cdcaa50 >> traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfffffe044cdcac60 >> traverse_dnode() at traverse_dnode+0x98/frame 0xfffffe044cdcacd0 >> traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfffffe044cdcaee0 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb0f0 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb300 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb510 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb720 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb930 >> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcbb40 >> traverse_dnode() at traverse_dnode+0x98/frame 0xfffffe044cdcbbb0 >> traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfffffe044cdcbdc0 >> traverse_impl() at traverse_impl+0x79d/frame 0xfffffe044cdcbfd0 >> traverse_dataset() at traverse_dataset+0x93/frame 0xfffffe044cdcc040 >> traverse_pool() at traverse_pool+0x1f2/frame 0xfffffe044cdcc140 >> spa_load_verify() at spa_load_verify+0xf3/frame 0xfffffe044cdcc1f0 >> spa_load_impl() at spa_load_impl+0x2069/frame 0xfffffe044cdcc610 >> spa_load() at spa_load+0x320/frame 0xfffffe044cdcc6d0 >> spa_load_impl() at spa_load_impl+0x150b/frame 0xfffffe044cdccaf0 >> spa_load() at spa_load+0x320/frame 0xfffffe044cdccbb0 >> spa_load_best() at spa_load_best+0xc6/frame 0xfffffe044cdccc50 >> spa_open_common() at spa_open_common+0x246/frame 0xfffffe044cdccd40 >> spa_open() at spa_open+0x35/frame 0xfffffe044cdccd70 >> dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfffffe044cdccdb0 >> dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfffffe044cdcce30 >> zfsvfs_create() at zfsvfs_create+0x100/frame 0xfffffe044cdcd050 >> zfs_domount() at zfs_domount+0xa7/frame 0xfffffe044cdcd0e0 >> zfs_mount() at zfs_mount+0x6c3/frame 0xfffffe044cdcd390 >> vfs_donmount() at vfs_donmount+0x1330/frame 0xfffffe044cdcd660 >> kernel_mount() at kernel_mount+0x62/frame 0xfffffe044cdcd6c0 >> parse_mount() at parse_mount+0x668/frame 0xfffffe044cdcd810 >> vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfffffe044cdcd9d0 >> start_init() at start_init+0x62/frame 0xfffffe044cdcda70 >> fork_exit() at fork_exit+0x84/frame 0xfffffe044cdcdab0 >> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe044cdcdab0 >> --- trap 0, rip = 0, rsp = 0, rbp = 0 --- >> KDB: enter: panic >> >> Didn't get a core because it panics before dumpdev is set. >> >> Is anyone else able to run -O0 kernels or do I have something set to evil? > > As I recall, double faults are commonly caused by overflowing the kernel > stack. If I subtract the values of the first and last frame pointers, I > get 14752, which is getting pretty large, and rsp rbp in the trap point > to different 4K pages, so a stack overflow certainly looks possible. > > Try bumping up KSTACK_PAGES in your kernel config. Actually, that's not necessary anymore since it was made into a tunable in -CURRENT fairly recently. Just set kern.kstack_pages to something larger in loader.conf.