From nobody Tue Jun 8 15:54:05 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id DB3827EF019 for ; Tue, 8 Jun 2021 15:54:07 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FzvtR5gcxz3Qb5 for ; Tue, 8 Jun 2021 15:54:07 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: by mail-wm1-x32a.google.com with SMTP id h22-20020a05600c3516b02901a826f84095so2208598wmq.5 for ; Tue, 08 Jun 2021 08:54:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=b/t4ej+xUy5mUD/YGDHYlAAw27Q70Wuflf6ZXonwUto=; b=a4gdzB++Z0o+nMWPUwnDB/f2z2Vqbd3hANliEAJZvEy84pX3S1JWS5Yj7pTH//p7q3 GxusZse/WSuq85N82oM4gaSbUn9/385VU4Kn5OZUmmkzlTX9iW4pozdZ15LT2ThZ95mW zkk3Kzz5e0Scx2Utix23AgMcNVKXtP6CthLPY0WiNyOmWTpeNdO7DnjgdUBqXSDbrdeR DccFVi9+jJnF+JXi9llv5K9pYU1tyxmJ7XaOTRpYbHw3+hzUVecnvMOdceAgO/awfUcD ytI50aPCjGGRBvh54iBMav2Rn06T+3Z510TuaS/qFGT8bM9hbMSl7NjHQU/jev9VeCDS TneA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=b/t4ej+xUy5mUD/YGDHYlAAw27Q70Wuflf6ZXonwUto=; b=qkVbQs4KOu3MmbSg08gWEBUEryCeCb2tAeMTPZNbgZem668LmUP/aPS26fUndPc+lg rrPA/3GOdbkN9iYJaJC/hqmBVZvQcuZ7ez4/OvgJc0Dv0iN3oVX+tWaruJpsJcHMZQpK chEEvnnGRB97tRlSJOAZ/t/zm/sXdYEqK/MMN7Ta8ATLpx93iDtY/nMzRkorEJCiYLuT ciItglNHexWfXFf5gszYOKFhyY9qTe/B/HiAIRJqK9xfnRi+g7nFIrRV7P5J0AmS7Pyz ktHAJddVaPJov0ksPu3x4uJ1w3xc92kf77iULZkCBqz0x3M1n78j2izuFlC0aQt+tplV vROg== X-Gm-Message-State: AOAM530dJNUBiW4SqPo9Ra2Y2ZPJweyABu3d6cP765LLQBbUt12wdLIN KsfCPJymCYOFmeuCumj1TV2lb8YFAA4= X-Google-Smtp-Source: ABdhPJzBaDtQai5+2FxKh0j/ILtwX6KV19Z3s2EEdy+wZh46m4lFdAWf4EfnkiO851260vgeDbpc8g== X-Received: by 2002:a1c:f717:: with SMTP id v23mr4947038wmh.32.1623167646197; Tue, 08 Jun 2021 08:54:06 -0700 (PDT) Received: from ernst.home (pd9e2360f.dip0.t-ipconnect.de. [217.226.54.15]) by smtp.gmail.com with ESMTPSA id r7sm4345243wma.9.2021.06.08.08.54.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Jun 2021 08:54:05 -0700 (PDT) Date: Tue, 8 Jun 2021 17:54:05 +0200 From: Gary Jennejohn To: Warner Losh Cc: FreeBSD Current Subject: Re: kernel panic while copying files Message-ID: <20210608155405.5cf0e200@ernst.home> In-Reply-To: References: <20210607090109.08ecb130@ernst.home> <20210608084646.6a7e1bc7@ernst.home> Reply-To: gljennjohn@gmail.com X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; amd64-portbld-freebsd14.0) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4FzvtR5gcxz3Qb5 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Tue, 8 Jun 2021 06:27:04 -0600 Warner Losh wrote: > On Tue, Jun 8, 2021 at 2:47 AM Gary Jennejohn wrote: > [snip old stuff] > > Here the kgdb backtrace: > > > > Unread portion of the kernel message buffer: > > panic: Duplicate free of 0xfffff800356b9000 from zone > > 0xfffffe00dcbdd800(da_ccb) slab 0xfffff800356b9fd8(0) > > cpuid = 8 > > time = 1623140519 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > > 0xfffffe00c5f398c0 > > vpanic() at vpanic+0x181/frame 0xfffffe00c5f39910 > > panic() at panic+0x43/frame 0xfffffe00c5f39970 > > uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfffffe00c5f399b0 > > uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfffffe00c5f39a00 > > camperiphdone() at camperiphdone+0x1b7/frame 0xfffffe00c5f39b20 > > xpt_done_process() at xpt_done_process+0x3dd/frame 0xfffffe00c5f39b60 > > xpt_done_td() at xpt_done_td+0xf5/frame 0xfffffe00c5f39bb0 > > fork_exit() at fork_exit+0x80/frame 0xfffffe00c5f39bf0 > > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c5f39bf0 > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > KDB: enter: panic > > > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > > 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" > > (offsetof(struct pcpu, > > (kgdb) bt > > #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > > #1 doadump (textdump=textdump@entry=0) > > at /usr/src/sys/kern/kern_shutdown.c:399 > > #2 0xffffffff8040c39a in db_dump (dummy=, > > dummy2=, dummy3=, dummy4=) > > at /usr/src/sys/ddb/db_command.c:575 > > #3 0xffffffff8040c192 in db_command (last_cmdp=, > > cmd_table=, dopager=dopager@entry=1) > > at /usr/src/sys/ddb/db_command.c:482 > > #4 0xffffffff8040beed in db_command_loop () > > at /usr/src/sys/ddb/db_command.c:535 > > #5 0xffffffff8040f616 in db_trap (type=, code= > out>) > > at /usr/src/sys/ddb/db_main.c:270 > > #6 0xffffffff8066b1c4 in kdb_trap (type=type@entry=3, code=code@entry=0, > > tf=, tf@entry=0xfffffe00c5f397f0) > > at /usr/src/sys/kern/subr_kdb.c:727 > > #7 0xffffffff809a4e96 in trap (frame=0xfffffe00c5f397f0) > > at /usr/src/sys/amd64/amd64/trap.c:604 > > #8 > > #9 kdb_enter (why=0xffffffff80a61a23 "panic", msg=) > > at /usr/src/sys/kern/subr_kdb.c:506 > > #10 0xffffffff806207a2 in vpanic (fmt=, ap=, > > ap@entry=0xfffffe00c5f39950) at /usr/src/sys/kern/kern_shutdown.c:907 > > #11 0xffffffff80620533 in panic ( > > fmt=0xffffffff80d635c8 ".\024\244\200\377\377\377\377") > > at /usr/src/sys/kern/kern_shutdown.c:843 > > #12 0xffffffff808e12b1 in uma_dbg_free (zone=0xfffffe00dcbdd800, > > slab=0xfffff800356b9fd8, item=0xfffff800356b9000) > > at /usr/src/sys/vm/uma_core.c:5664 > > #13 0xffffffff808d9de7 in item_dtor (zone=0xfffffe00dcbdd800, > > item=0xfffff800356b9000, size=544, udata=0x0, skip=SKIP_NONE) > > at /usr/src/sys/vm/uma_core.c:3418 > > #14 uma_zfree_arg (zone=0xfffffe00dcbdd800, item=0xfffff800356b9000, > > udata=udata@entry=0x0) at /usr/src/sys/vm/uma_core.c:4374 > > #15 0xffffffff802da503 in uma_zfree (zone=0xffffffff80d635c8 , > > item=0x200) at /usr/src/sys/vm/uma.h:404 > > > > OK. This is a bad stack trace. camperiphdone doesn't call uma_zfree()... > It does call > xpt_free_ccb, though, and that's likely what's going wrong. And that > matches the line > numbers. Most likely this is llvm's tail call optimizations... Can you > compile the kernel > either -O0 or with -fno-optimize-sibling-calls? That will give a better > call stack. > > However, it's likely the new UMA stuff trasz committed (or it's providing > better > diagnostics than the old malloc based code which seems more likely) that > can be > disabled by the tunable kern.cam.da.enable_uma_ccbs=0. > > The lines in question: > saved_ccb = (union ccb *)done_ccb->ccb_h.saved_ccb_ptr; > bcopy(saved_ccb, done_ccb, sizeof(*done_ccb)); > xpt_free_ccb(saved_ccb); > > So we overwrite the done_ccb with the saved_ccb's contents and then free > the saved ccb. > That's likely OKish, though. > > We copy entire CCBs around in this code a lot, and I've not traced through > it. But we're > sending a scsi start unit in response to some error that is being reported > via cam_periph_error() > > #16 0xffffffff802d9117 in camperiphdone (periph=0xfffff800061e2c00, > > done_ccb=0xfffff800355d6cc0) at /usr/src/sys/cam/cam_periph.c:1427 > > #17 0xffffffff802dfebd in xpt_done_process (ccb_h=0xfffff800355d6cc0) > > at /usr/src/sys/cam/cam_xpt.c:5491 > > #18 0xffffffff802e1ec5 in xpt_done_td ( > > arg=arg@entry=0xffffffff80d33d80 ) > > at /usr/src/sys/cam/cam_xpt.c:5546 > > #19 0xffffffff805dad80 in fork_exit (callout=0xffffffff802e1dd0 > > , > > arg=0xffffffff80d33d80 , frame=0xfffffe00c5f39c00) > > at /usr/src/sys/kern/kern_fork.c:1083 > > #20 > > > > Apparently caused by recent changes to CAM. > > > > Let me know if you want more information. > > > > what's *periph say? > Well, flags is CAM_PERIPH_RECOVERY_INPROG|CAM_PERIPH_LOCK_WANTED |CAM_PERIPH_LOCKED. Here's the entire dump: (kgdb) p/x *(struct cam_periph *)periph $1 = {periph_start = 0xffffffff80328a00, periph_oninval = 0xffffffff803287e0, periph_dtor = 0xffffffff803288a0, periph_name = 0xffffffff80e545d7, path = 0xfffff8000611cdc0, softc = 0xfffffe01117ca000, sim = 0xfffff80005873600, unit_number = 0x0, type = 0x0, flags = 0x26, scheduled_priority = 0xffffffff, immediate_priority = 0xffffffff, periph_allocating = 0x0, periph_allocated = 0x1, refcount = 0x5, ccb_list = { slh_first = 0x0}, periph_links = {sle_next = 0xfffff8005d68be00}, unit_links = {tqe_next = 0x0, tqe_prev = 0xffffffff810530a8}, deferred_callback = 0x0, deferred_ac = 0x0, periph_run_task = {ta_link = { stqe_next = 0x0}, ta_pending = 0x0, ta_priority = 0x0, ta_flags = 0x0, ta_func = 0xffffffff802dd4d0, ta_context = 0xfffff8005d68bd00}, ccb_zone = 0xfffffe00dc9d9800} path does not seem very useful because it contains a bunch of pointers. (kgdb) x/s ((struct cam_periph *)periph)->periph_name 0xffffffff80e545d7: "da" > camperiphdone is only use when we send a scsi start/stop is sent to try to > recover > in a couple of cases: > switch(ccb->csio.scsi_status) { ... > case SCSI_STATUS_CMD_TERMINATED: > case SCSI_STATUS_CHECK_COND: > error = camperiphscsisenseerror(ccb, orig_ccb, > ... > is the path. It might be profitable to use the debugger to trap here to see > where we're > kicking off this error recovery process. > Ok, I set a breakpoint in ddb at camperiphdone. Once ddb was in that routine I set a breakpoint in camperiphscsisenseerror, but that was never hit. So, apparently that routine is not being entered and the kernel paniced. With kern.cam.da.enable_uma_ccbs=0 there is no panic. Here's the kgdb backtrace with the -O0 kernel: (kgdb) bt #0 0xffffffff8081d706 in doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:398 #1 0xffffffff804ef15a in db_dump (dummy=-2138500043, dummy2=false, dummy3=-1, dummy4=0xfffffe00c62a11b0 "") at /usr/src/sys/ddb/db_command.c:575 #2 0xffffffff804eef5f in db_command ( last_cmdp=0xffffffff8114d380 , cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:482 #3 0xffffffff804eeb38 in db_command_loop () at /usr/src/sys/ddb/db_command.c:535 #4 0xffffffff804f38ef in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:270 #5 0xffffffff80891d02 in kdb_trap (type=3, code=0, tf=0xfffffe00c62a1680) at /usr/src/sys/kern/subr_kdb.c:727 #6 0xffffffff80dd53c3 in trap (frame=0xfffffe00c62a1680) at /usr/src/sys/amd64/amd64/trap.c:604 #7 0xffffffff80dd6718 in trap_check (frame=0xfffffe00c62a1680) at /usr/src/sys/amd64/amd64/trap.c:664 #8 #9 breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66 #10 0xffffffff808910d0 in kdb_enter (why=0xffffffff80eaaf0b "panic", msg=0xffffffff80eaaf0b "panic") at /usr/src/sys/kern/subr_kdb.c:505 #11 0xffffffff8081dbfe in vpanic ( fmt=0xffffffff80e80f73 "Duplicate free of %p from zone %p(%s) slab %p(%d)", ap=0xfffffe00c62a1850) at /usr/src/sys/kern/kern_shutdown.c:906 #12 0xffffffff8081d6b0 in panic ( fmt=0xffffffff80e80f73 "Duplicate free of %p from zone %p(%s) slab %p(%d)") at /usr/src/sys/kern/kern_shutdown.c:843 #13 0xffffffff80caaec5 in uma_dbg_free (zone=0xfffffe00dc9d9800, slab=0xfffff80007ee0fd8, item=0xfffff80007ee0000) at /usr/src/sys/vm/uma_core.c:5664 #14 0xffffffff80c9faf5 in item_dtor (zone=0xfffffe00dc9d9800, item=0xfffff80007ee0000, size=544, udata=0x0, skip=SKIP_NONE) at /usr/src/sys/vm/uma_core.c:3418 #15 0xffffffff80c9eec7 in uma_zfree_arg (zone=0xfffffe00dc9d9800, item=0xfffff80007ee0000, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 #16 0xffffffff802e5a89 in uma_zfree (zone=0xfffffe00dc9d9800, item=0xfffff80007ee0000) at /usr/src/sys/vm/uma.h:404 #17 0xffffffff802dcfa6 in xpt_free_ccb (free_ccb=0xfffff80007ee0000) at /usr/src/sys/cam/cam_xpt.c:4674 #18 0xffffffff802db639 in camperiphdone (periph=0xfffff8005d68bd00, done_ccb=0xfffff80007797cc0) at /usr/src/sys/cam/cam_periph.c:1427 #19 0xffffffff802e59b6 in xpt_done_process (ccb_h=0xfffff80007797cc0) at /usr/src/sys/cam/cam_xpt.c:5491 #20 0xffffffff802e811e in xpt_done_td (arg=0xffffffff81143c00 ) at /usr/src/sys/cam/cam_xpt.c:5546 #21 0xffffffff807ac0ea in fork_exit (callout=0xffffffff802e7f20 , arg=0xffffffff81143c00 , frame=0xfffffe00c62a1c00) at /usr/src/sys/kern/kern_fork.c:1083 #22 -- Gary Jennejohn