Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Feb 2020 13:33:29 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Larry Rosenman <ler@lerctr.org>
Cc:        Freebsd current <freebsd-current@freebsd.org>
Subject:   Re: Panic with ataintel and not ready CD on a Dell r710@r357958
Message-ID:  <7F73C936-1F16-4D60-9FF6-2FA7C54909FE@gmail.com>
In-Reply-To: <2b8c652dad43a5950e74000b6ccd7fc5@lerctr.org>
References:  <df6a74e1bf7e5cdd128aa656c93ec4b5@lerctr.org> <2b8c652dad43a5950e74000b6ccd7fc5@lerctr.org>

next in thread | previous in thread | raw e-mail | index | archive | help


> On Feb 17, 2020, at 1:18 PM, Larry Rosenman <ler@lerctr.org> wrote:
>=20
> On 02/17/2020 1:46 pm, Larry Rosenman wrote:
>> Unread portion of the kernel message buffer:
>> panic: aprobe1: freed with 1 active CCBs
>> cpuid =3D 22
>> time =3D 1581771571
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame =
0xfffffe01fb9a11a0
>> vpanic() at vpanic+0x185/frame 0xfffffe01fb9a1200
>> panic() at panic+0x43/frame 0xfffffe01fb9a1260
>> cam_periph_release_locked_buses() at
>> cam_periph_release_locked_buses+0x372/frame 0xfffffe01fb9a1780
>> cam_periph_release_locked() at cam_periph_release_locked+0x1b/frame
>> 0xfffffe01fb9a17a0
>> probedone() at probedone+0x186/frame 0xfffffe01fb9a1c60
>> xpt_done_process() at xpt_done_process+0x358/frame 0xfffffe01fb9a1ca0
>> xpt_done_td() at xpt_done_td+0xf5/frame 0xfffffe01fb9a1cf0
>> fork_exit() at fork_exit+0x80/frame 0xfffffe01fb9a1d30
>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01fb9a1d30
>> --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---
>> Uptime: 1m8s
>> Dumping 6077 out of 131029 =
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>> 55              __asm("movq %%gs:%P1,%0" : "=3Dr" (td) : "n"
>> (offsetof(struct pcpu,
>> (kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>> #1  doadump (textdump=3D1) at /usr/src/sys/kern/kern_shutdown.c:393
>> #2  0xffffffff804bdf80 in kern_reboot (howto=3D260)
>>    at /usr/src/sys/kern/kern_shutdown.c:480
>> #3  0xffffffff804be3dd in vpanic (fmt=3D<optimized out>, =
ap=3D<optimized out>)
>>    at /usr/src/sys/kern/kern_shutdown.c:910
>> #4  0xffffffff804be133 in panic (fmt=3D<unavailable>)
>>    at /usr/src/sys/kern/kern_shutdown.c:836
>> #5  0xffffffff823c5bc2 in camperiphfree (periph=3D0xfffff80115da2300)
>>    at /usr/src/sys/cam/cam_periph.c:685
>> #6  cam_periph_release_locked_buses (periph=3D0xfffff80115da2300)
>>    at /usr/src/sys/cam/cam_periph.c:450
>> #7  0xffffffff823c5bfb in cam_periph_release_locked =
(periph=3D0xfffff80115da2300)
>>    at /usr/src/sys/cam/cam_periph.c:461
>> #8  0xffffffff8240dce6 in probedone (periph=3D0xfffff80115da2300,
>>    done_ccb=3D<optimized out>) at /usr/src/sys/cam/ata/ata_xpt.c:1352
>> #9  0xffffffff823cee08 in xpt_done_process (ccb_h=3D0xfffff8015013e800)=

>>    at /usr/src/sys/cam/cam_xpt.c:5488
>> #10 0xffffffff823d0db5 in xpt_done_td (arg=3D0xffffffff8243d780 =
<cam_doneqs+128>)
>>    at /usr/src/sys/cam/cam_xpt.c:5515
>> #11 0xffffffff80483200 in fork_exit (callout=3D0xffffffff823d0cc0 =
<xpt_done_td>,
>>    arg=3D0xffffffff8243d780 <cam_doneqs+128>, =
frame=3D0xfffffe01fb9a1d40)
>>    at /usr/src/sys/kern/kern_fork.c:1059
>> #12 <signal handler called>
>> (kgdb)
>> Core IS available as is the kernel
>> I do load the ataintel driver as a module.  Removing it allows me to =
boot.
>> What info do you all need?
>=20
> Forgot to include, the previous working version was r356506

I=E2=80=99ve fixed this in r357969 which reverted r357897.

Looks like you tried 11 revs too soon. The commit message for r357969 =
says it all:

    The KASSERT is too strict: revert r357897

    It's valid for a periph to be removed with outstanding transactions =
on the
    device. In CAM, multiple periphs attach to a single device. There's =
no interlock
    to prevent one of these going away while other periphs have =
outstanding CCBs and
    it's not an error either. Remove this overly agressive KASSERT to =
prevent
    false-positive panics when devices depart.

Sorry for the hassle. I=E2=80=99ve been trying to find a way to trap a =
race that we=E2=80=99re seeing at work sooner, and I thought this was =
good, but I tested my kernel on a non-invariants tree so thought it was =
cool, only to discover a little later it wasn=E2=80=99t. :(

Warner=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7F73C936-1F16-4D60-9FF6-2FA7C54909FE>