From nobody Wed Sep 1 15:07:33 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 9A3EF1340AA0 for ; Wed, 1 Sep 2021 15:07:44 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound4v.ore.mailhop.org (outbound4v.ore.mailhop.org [44.233.143.45]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4H06qh0zM9z4dCq for ; Wed, 1 Sep 2021 15:07:44 +0000 (UTC) (envelope-from ian@freebsd.org) ARC-Seal: i=1; a=rsa-sha256; t=1630508857; cv=none; d=outbound.mailhop.org; s=arc-outbound20181012; b=iIBOTUODgFfPEKjq2+FVGnr75srX7w+ws5qO6l5Y6Hp7hvE/cI+eOpC5QwsPGM0j8n0MmyJqkQN77 +m3c/W8/iREzEVsEUCfqpZXE2rUN3exqrPm173tduXvTOm58sgDx9ncphr0h6avkONevYKiJ0bXz/b mI+kJBLXegHCvQVY9dsbygjyeS+Z2uLG0beJtPzXMIHa3o9vsaAb5BLb5a41LmXazB1XLhRrrI+mB7 PM2SsyD/MFV3k7OjdhfSrFQUW0++kZldqC3tPrkM7VpesYsiwhL42wz9SW/26+qAztK86xRNeLOTTm VpgB2nw7J0AnWrnd0mKtkyE49hez2UA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=outbound.mailhop.org; s=arc-outbound20181012; h=content-transfer-encoding:mime-version:content-type:references:in-reply-to: date:cc:to:from:subject:message-id:dkim-signature:from; bh=9aPUm6ZvhzAhZkunctYJL3HQVD8J+T/uyu54Xa8o/cI=; b=gego9JmS9mTZnngRvs/oM/lbuvrxLMOqU2+s9WzaMCf3C87dcNaFXujgnzGKduxVIo1KVEad9r0Pz Py575GCmWaysbrAh4TdPCP3rC1x+c+/21Ah8yXR8LUKP+vqUmp5bao4xWrzGtxjzCdAmuDKX3aUYWB IU2oLCTELiLw6P4EhC8XYGl1WrEVRnNuIlJrsk7pOxBWNmrHVz6h6Om/eGZzgHY10EZbgwPzc9xPF4 uF0FjN/yx6yo7OpnRbzbUO4xUVO/UI2XdxuAUjjIN+RLYwpo9uur1ntC/gm+2fnuEKYQYbRRvn6dJp cPUlWfUur+15Du6PWZqcKiCj4wwGu/A== ARC-Authentication-Results: i=1; outbound4.ore.mailhop.org; spf=softfail smtp.mailfrom=freebsd.org smtp.remote-ip=67.177.211.60; dmarc=none header.from=freebsd.org; arc=none header.oldest-pass=0; DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outbound.mailhop.org; s=dkim-high; h=content-transfer-encoding:mime-version:content-type:references:in-reply-to: date:cc:to:from:subject:message-id:from; bh=9aPUm6ZvhzAhZkunctYJL3HQVD8J+T/uyu54Xa8o/cI=; b=ssDzotfq10J2qK569bgsUJ98K9JzTFQPbQQWsRrl++36ufCGNSzhKeBQaA0BH3/K6u+y8KdpOqE1T 0ypxCZnZHkOtEllzW+Hel83pZzJr/9HbqIdLVakrVxYG94oVkYGx1+C0aegZRqpEhT575PDjSUk0nj H+tPFuAY3+wFZn4XQxuKzvF+n+QrePgdAhYk/agiJdTDHC5Q1/GoeDMcYugXKKBi/tkTG4jS4F6gmZ UPgO2je4v12yljt1IM869bVxnJevPSDq6u6tnNdHFGQWxrb3EDvsFtRntG9kCw1+IulReYKZ90HpFR GpDZgBU+N5G9uN49tjLzx3OBJKYQ2wQ== X-Originating-IP: 67.177.211.60 X-MHO-RoutePath: aGlwcGll X-MHO-User: 562bbb1b-0b36-11ec-a658-89389772cfc7 X-Report-Abuse-To: https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information X-Mail-Handler: DuoCircle Outbound SMTP Received: from ilsoft.org (c-67-177-211-60.hsd1.co.comcast.net [67.177.211.60]) by outbound4.ore.mailhop.org (Halon) with ESMTPSA id 562bbb1b-0b36-11ec-a658-89389772cfc7; Wed, 01 Sep 2021 15:07:35 +0000 (UTC) Received: from [172.22.42.84] (rev2.hippie.lan [172.22.42.84]) by ilsoft.org (8.15.2/8.15.2) with ESMTP id 181F7XR1088086; Wed, 1 Sep 2021 09:07:33 -0600 (MDT) (envelope-from ian@freebsd.org) X-Authentication-Warning: paranoia.hippie.lan: Host rev2.hippie.lan [172.22.42.84] claimed to be [172.22.42.84] Message-ID: <5d812d36e5291af35ecadf910a6d5f76b60f1441.camel@freebsd.org> Subject: Re: Patched gpsd and /dev/pps0 results in "sleeping thread" kernel panic From: Ian Lepore To: Warner Losh Cc: Craig Leres , FreeBSD Hackers Date: Wed, 01 Sep 2021 09:07:33 -0600 In-Reply-To: References: <5476ea21-9e8a-32f5-08ff-add46c02d910@freebsd.org> Content-Type: text/plain; charset="ISO-8859-1" User-Agent: Evolution 3.40.3 FreeBSD GNOME Team List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4H06qh0zM9z4dCq X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Tue, 2021-08-31 at 22:35 -0600, Warner Losh wrote: > On Tue, Aug 31, 2021 at 9:25 PM Warner Losh wrote: > > > > > > > On Tue, Aug 31, 2021 at 9:13 PM Ian Lepore wrote: > > > > > On Tue, 2021-08-31 at 21:07 -0600, Warner Losh wrote: > > > > On Tue, Aug 31, 2021 at 8:11 PM Craig Leres > > > > wrote: > > > > > > > > > > > > > > On 8/31/21 6:54 PM, Warner Losh wrote: > > > > > > Can you post a traceback? We recently tightened up the > > > > > > sleeping > > > > > conditions > > > > > > and many of the panics are straight forward to fix > > > > > > > > > > Is this sufficient? > > > > > > > > > > > > > Maybe. I'm guessing that the pps driver isn't setting > > > > driver_mtx for > > > > the > > > > pps_state that it keeps, so when we're sleeping in pps_ioctl > > > > we're > > > > not > > > > dropping the the ppbus lock, which leads to this diagnostic. > > > > > > > > Warner > > > > > > > > > > > > > > > > > Exactly so, I've just been digging through the code.  It's not an > > > easy > > > fix because the ppbus pps driver doesn't have access to the ppbus > > > mutex > > > at the point where it's registering to be a pps driver. > > > > > > > I just came to the conclusion that either we'd need to create a > > call in > > ppbus > > to set this mutex, or to return its lock... > > > > > > > Craig, is there any chance you can configure your hardware so > > > that the > > > PPS signal is connected to either the CTS or DCD pin of a uart > > > (even > > > the same uart that's communicating with the gps receiver should > > > work if > > > you use DCD)?  You'll get jitter and accuracy performance > > > comparable to > > > the parallel port pin, and avoid this mutex/sleeping problem. > > > > > > > That's easier, but depending on the hardware might be harder :( > > > > Either I'm missing something (likely am), or this might fix it up, > or at least get away from the warning: > > https://reviews.freebsd.org/D31763 > > Note: I can't recall why ppbus has to be locked for this call. > This code dates from the very earliest days of locking and > so may do things simply because it seemed like a good idea > without a specific notion as to what that lock is protecting. If > so, the real fix may be to not take the lock in pps_ioctl at > all and maybe instead use a reference count (the most > often reason for 'a good idea' was to keep the device > from going away, though this is a parent lock, not a > child one so I'm less sure about that being the reason). > > Warner > > Yeah, figuring out whether it's safe to drop/re-aquire the ppsbus lock at this point was the harder part of the work; I'm not very familiar with the ppsbus stuff in general. Warner, I pointed out a small change needed to your proposed patch in D31763. In the past I've found that one can't propose a different patch in Phab without commandeering the entire change, so I just posted an inline comment with the couple extra lines of code needed, hoping you have more time today than I do to prepare a new patch. -- Ian > > Warner > > > > > > > -- Ian > > > > > > > > > > > > > >                 Craig > > > > > > > > > > toc2 4 # kgdb /boot/kernel/kernel /var/crash/vmcore.1 > > > > > GNU gdb (GDB) 10.2 [GDB v10.2 for FreeBSD] > > > > > Copyright (C) 2021 Free Software Foundation, Inc. > > > > > License GPLv3+: GNU GPL version 3 or later > > > > > > > > > > This is free software: you are free to change and > > > > > redistribute it. > > > > > There is NO WARRANTY, to the extent permitted by law. > > > > > Type "show copying" and "show warranty" for details. > > > > > This GDB was configured as "x86_64-portbld-freebsd12.2". > > > > > Type "show configuration" for configuration details. > > > > > For bug reporting instructions, please see: > > > > > . > > > > > Find the GDB manual and other documentation resources online > > > > > at: > > > > >      . > > > > > > > > > > For help, type "help". > > > > > Type "apropos word" to search for commands related to > > > > > "word"... > > > > > Reading symbols from /boot/kernel/kernel... > > > > > Reading symbols from /boot/kernel.LBLNET/kernel.debug... > > > > > > > > > > Unread portion of the kernel message buffer: > > > > > Sleeping thread (tid 101408, pid 2046) owns a non-sleepable > > > > > lock > > > > > KDB: stack backtrace of thread 101408: > > > > > sched_switch() at sched_switch+0x630/frame 0xfffffe0070e4f760 > > > > > mi_switch() at mi_switch+0xd4/frame 0xfffffe0070e4f790 > > > > > sleepq_catch_signals() at sleepq_catch_signals+0x403/frame > > > > > 0xfffffe0070e4f7e0 > > > > > sleepq_timedwait_sig() at sleepq_timedwait_sig+0x14/frame > > > > > 0xfffffe0070e4f820 > > > > > _sleep() at _sleep+0x1b3/frame 0xfffffe0070e4f8a0 > > > > > pps_ioctl() at pps_ioctl+0x298/frame 0xfffffe0070e4f8f0 > > > > > ppsioctl() at ppsioctl+0x48/frame 0xfffffe0070e4f920 > > > > > devfs_ioctl() at devfs_ioctl+0xb0/frame 0xfffffe0070e4f970 > > > > > VOP_IOCTL_APV() at VOP_IOCTL_APV+0x7b/frame > > > > > 0xfffffe0070e4f9a0 > > > > > vn_ioctl() at vn_ioctl+0x16a/frame 0xfffffe0070e4fab0 > > > > > devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame > > > > > 0xfffffe0070e4fad0 > > > > > kern_ioctl() at kern_ioctl+0x2b7/frame 0xfffffe0070e4fb30 > > > > > sys_ioctl() at sys_ioctl+0xfa/frame 0xfffffe0070e4fc00 > > > > > amd64_syscall() at amd64_syscall+0x387/frame > > > > > 0xfffffe0070e4fd30 > > > > > fast_syscall_common() at fast_syscall_common+0xf8/frame > > > > > 0xfffffe0070e4fd30 > > > > > --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = > > > > > 0x8004c899a, rsp > > > > > = > > > > > 0x7fffdfdfc6a8, rbp = 0x7fffdfdfc730 --- > > > > > panic: sleeping thread > > > > > cpuid = 8 > > > > > time = 1630456142 > > > > > KDB: stack backtrace: > > > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > > > > > 0xfffffe005ab73ab0 > > > > > vpanic() at vpanic+0x17b/frame 0xfffffe005ab73b00 > > > > > panic() at panic+0x43/frame 0xfffffe005ab73b60 > > > > > propagate_priority() at propagate_priority+0x282/frame > > > > > 0xfffffe005ab73b90 > > > > > turnstile_wait() at turnstile_wait+0x30c/frame > > > > > 0xfffffe005ab73be0 > > > > > __mtx_lock_sleep() at __mtx_lock_sleep+0x199/frame > > > > > 0xfffffe005ab73c70 > > > > > ppcintr() at ppcintr+0x2a0/frame 0xfffffe005ab73c90 > > > > > ithread_loop() at ithread_loop+0x23c/frame 0xfffffe005ab73cf0 > > > > > fork_exit() at fork_exit+0x7e/frame 0xfffffe005ab73d30 > > > > > fork_trampoline() at fork_trampoline+0xe/frame > > > > > 0xfffffe005ab73d30 > > > > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > > > > Uptime: 17m20s > > > > > Dumping 607 out of 12240 > > > > > MB:..3%..11%..22%..32%..43%..51%..61%..72%..82%..93% > > > > > > > > > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > > > > > 55      /usr/src/sys/amd64/include/pcpu_aux.h: No such file > > > > > or > > > > > directory. > > > > > (kgdb) bt > > > > > #0  __curthread () at > > > > > /usr/src/sys/amd64/include/pcpu_aux.h:55 > > > > > #1  doadump (textdump=1) at ../../../kern/kern_shutdown.c:371 > > > > > #2  0xffffffff80b83ada in kern_reboot (howto=260) > > > > >      at ../../../kern/kern_shutdown.c:451 > > > > > #3  0xffffffff80b83f33 in vpanic (fmt=, > > > > > ap=) > > > > >      at ../../../kern/kern_shutdown.c:880 > > > > > #4  0xffffffff80b83d53 in panic (fmt=) > > > > >      at ../../../kern/kern_shutdown.c:807 > > > > > #5  0xffffffff80be7152 in propagate_priority > > > > > (td=0xfffff80196959740) > > > > >      at ../../../kern/subr_turnstile.c:228 > > > > > #6  0xffffffff80be7d1c in turnstile_wait > > > > > (ts=0xfffff8000357a780, > > > > >      owner=, queue=0) at > > > > > ../../../kern/subr_turnstile.c:785 > > > > > #7  0xffffffff80b62ca9 in __mtx_lock_sleep > > > > > (c=0xfffff80003969bd0, > > > > >      v=) at ../../../kern/kern_mutex.c:654 > > > > > #8  0xffffffff8086fcc0 in ppcintr (arg=0xfffff80003969b00) > > > > >      at ../../../dev/ppc/ppc.c:1546 > > > > > #9  0xffffffff80b4637c in intr_event_execute_handlers > > > > > (p= > > > > out>, > > > > >      ie=0xfffff800030d9d00) at ../../../kern/kern_intr.c:1143 > > > > > #10 ithread_execute_handlers (p=, > > > > > ie=0xfffff800030d9d00) > > > > >      at ../../../kern/kern_intr.c:1156 > > > > > #11 ithread_loop (arg=0xfffff800039aa7e0) at > > > > > ../../../kern/kern_intr.c:1236 > > > > > #12 0xffffffff80b42e1e in fork_exit ( > > > > >      callout=0xffffffff80b46140 , > > > > > arg=0xfffff800039aa7e0, > > > > >      frame=0xfffffe005ab73d40) at > > > > > ../../../kern/kern_fork.c:1080 > > > > > #13 > > > > > (kgdb) > > > > > > > > > > > > > >