Date: Mon, 17 Feb 2020 13:33:29 -0700 From: Warner Losh <imp@bsdimp.com> To: Larry Rosenman <ler@lerctr.org> Cc: Freebsd current <freebsd-current@freebsd.org> Subject: Re: Panic with ataintel and not ready CD on a Dell r710@r357958 Message-ID: <7F73C936-1F16-4D60-9FF6-2FA7C54909FE@gmail.com> In-Reply-To: <2b8c652dad43a5950e74000b6ccd7fc5@lerctr.org> References: <df6a74e1bf7e5cdd128aa656c93ec4b5@lerctr.org> <2b8c652dad43a5950e74000b6ccd7fc5@lerctr.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Feb 17, 2020, at 1:18 PM, Larry Rosenman <ler@lerctr.org> wrote: >=20 > On 02/17/2020 1:46 pm, Larry Rosenman wrote: >> Unread portion of the kernel message buffer: >> panic: aprobe1: freed with 1 active CCBs >> cpuid =3D 22 >> time =3D 1581771571 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame = 0xfffffe01fb9a11a0 >> vpanic() at vpanic+0x185/frame 0xfffffe01fb9a1200 >> panic() at panic+0x43/frame 0xfffffe01fb9a1260 >> cam_periph_release_locked_buses() at >> cam_periph_release_locked_buses+0x372/frame 0xfffffe01fb9a1780 >> cam_periph_release_locked() at cam_periph_release_locked+0x1b/frame >> 0xfffffe01fb9a17a0 >> probedone() at probedone+0x186/frame 0xfffffe01fb9a1c60 >> xpt_done_process() at xpt_done_process+0x358/frame 0xfffffe01fb9a1ca0 >> xpt_done_td() at xpt_done_td+0xf5/frame 0xfffffe01fb9a1cf0 >> fork_exit() at fork_exit+0x80/frame 0xfffffe01fb9a1d30 >> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01fb9a1d30 >> --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- >> Uptime: 1m8s >> Dumping 6077 out of 131029 = MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 >> 55 __asm("movq %%gs:%P1,%0" : "=3Dr" (td) : "n" >> (offsetof(struct pcpu, >> (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 >> #1 doadump (textdump=3D1) at /usr/src/sys/kern/kern_shutdown.c:393 >> #2 0xffffffff804bdf80 in kern_reboot (howto=3D260) >> at /usr/src/sys/kern/kern_shutdown.c:480 >> #3 0xffffffff804be3dd in vpanic (fmt=3D<optimized out>, = ap=3D<optimized out>) >> at /usr/src/sys/kern/kern_shutdown.c:910 >> #4 0xffffffff804be133 in panic (fmt=3D<unavailable>) >> at /usr/src/sys/kern/kern_shutdown.c:836 >> #5 0xffffffff823c5bc2 in camperiphfree (periph=3D0xfffff80115da2300) >> at /usr/src/sys/cam/cam_periph.c:685 >> #6 cam_periph_release_locked_buses (periph=3D0xfffff80115da2300) >> at /usr/src/sys/cam/cam_periph.c:450 >> #7 0xffffffff823c5bfb in cam_periph_release_locked = (periph=3D0xfffff80115da2300) >> at /usr/src/sys/cam/cam_periph.c:461 >> #8 0xffffffff8240dce6 in probedone (periph=3D0xfffff80115da2300, >> done_ccb=3D<optimized out>) at /usr/src/sys/cam/ata/ata_xpt.c:1352 >> #9 0xffffffff823cee08 in xpt_done_process (ccb_h=3D0xfffff8015013e800)= >> at /usr/src/sys/cam/cam_xpt.c:5488 >> #10 0xffffffff823d0db5 in xpt_done_td (arg=3D0xffffffff8243d780 = <cam_doneqs+128>) >> at /usr/src/sys/cam/cam_xpt.c:5515 >> #11 0xffffffff80483200 in fork_exit (callout=3D0xffffffff823d0cc0 = <xpt_done_td>, >> arg=3D0xffffffff8243d780 <cam_doneqs+128>, = frame=3D0xfffffe01fb9a1d40) >> at /usr/src/sys/kern/kern_fork.c:1059 >> #12 <signal handler called> >> (kgdb) >> Core IS available as is the kernel >> I do load the ataintel driver as a module. Removing it allows me to = boot. >> What info do you all need? >=20 > Forgot to include, the previous working version was r356506 I=E2=80=99ve fixed this in r357969 which reverted r357897. Looks like you tried 11 revs too soon. The commit message for r357969 = says it all: The KASSERT is too strict: revert r357897 It's valid for a periph to be removed with outstanding transactions = on the device. In CAM, multiple periphs attach to a single device. There's = no interlock to prevent one of these going away while other periphs have = outstanding CCBs and it's not an error either. Remove this overly agressive KASSERT to = prevent false-positive panics when devices depart. Sorry for the hassle. I=E2=80=99ve been trying to find a way to trap a = race that we=E2=80=99re seeing at work sooner, and I thought this was = good, but I tested my kernel on a non-invariants tree so thought it was = cool, only to discover a little later it wasn=E2=80=99t. :( Warner=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7F73C936-1F16-4D60-9FF6-2FA7C54909FE>