From nobody Wed Jun 30 16:35:14 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 752B911D5F0E for ; Wed, 30 Jun 2021 16:35:27 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com [IPv6:2607:f8b0:4864:20::82a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GFRlz25p9z4qrD for ; Wed, 30 Jun 2021 16:35:27 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qt1-x82a.google.com with SMTP id d9so1883343qtx.8 for ; Wed, 30 Jun 2021 09:35:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pO0jQdTjfEFUUuY2YwAti9SAKZH/Mkm0UYO50lUakbE=; b=eEbUIG4ZMlLDVyC6ZGo3bDPbllrSTXIx6fd3Ye1UamMEpzx7e/LbizFjfDfQHdGhIg e94/7Z0MqqgrCc2IkGxe6+OOZyZyX4mEJ1A64DPVAjEeyxycj/LzMzTYw85drf3O5i1K hBChIVbIpD3mgOKsMinrsJh0jez/bB/ddiKAdZZHuTRmVEVbQrfe3/Fes7iyLowdBZao od6VNgaKdG5Dj8KZuJNk0CNEcfUNq+dYlnTltG7hbS3uDLEsRwokWrVLJwer1ekEqFvX PkUSDdfTH7hFBeU5JJNQmcmx4t/3wiDjBQ7Y3OOQuhQJfn2d5XHSTh1tTQxV6Kld63gw 0tIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pO0jQdTjfEFUUuY2YwAti9SAKZH/Mkm0UYO50lUakbE=; b=JE6w7eJ6sCmDUAnOQ5y9HsSh1EB6cnCt1f/qxzbhHGrxzLlU7gSCjImFVpOw7+aL3e kZjCUmFgxSOiU3u3qJWTFWVdFWEC7tRJpgptjnJpU0xAon+rcO9+DyPdIv7bhsn2cl4b uof/GWJPcuOZi0gJJlpaXc15p3SK7FNK7mHDzyfDcMxeY9Y7sNtyRpL3AqSL9PhOWQgG sWQuT7rPu4966rXo1yEJQ5k3EAf/UWqOVLJ+7vVEYPeXXDElpwRCEJeMsM5rbtATgzOI E8NRCLKWRl8Pqsr0+YdRMSqdux+qIdvS/Z9dcku0bY4KW7lzAIaAYG2butHKOr1uoKzM DBag== X-Gm-Message-State: AOAM533b0ZDZjKtUmsCc/8vZ+pbdYYOzk/nK2HBnlP/tQN8KiNcjTTx9 J/8Jm2BONotrhK93oh7qlht2ry9OE/z5UcHb7+g3ZA== X-Google-Smtp-Source: ABdhPJwlSIM+jL6Qr26lqIhwghrFXdUEtwuyKSUZCxa/Ud4bt7SWCyL8BGLeWa8V9rs6VwodIIzQEMRQTjUlp1zj9us= X-Received: by 2002:ac8:5d8d:: with SMTP id d13mr31740460qtx.49.1625070924971; Wed, 30 Jun 2021 09:35:24 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <20210607090109.08ecb130@ernst.home> <20210608084646.6a7e1bc7@ernst.home> <20210608155405.5cf0e200@ernst.home> <20210610095041.38d7597c@ernst.home> <20210629094201.77ef5f22@ernst.home> <20210630125703.2b5544e7@ernst.home> In-Reply-To: <20210630125703.2b5544e7@ernst.home> From: Warner Losh Date: Wed, 30 Jun 2021 10:35:14 -0600 Message-ID: Subject: Re: panic: Unaligned free (was: kernel panic while copying files) To: Gary Jennejohn Cc: FreeBSD Current Content-Type: multipart/alternative; boundary="00000000000037d0db05c5fe4fec" X-Rspamd-Queue-Id: 4GFRlz25p9z4qrD X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: Y --00000000000037d0db05c5fe4fec Content-Type: text/plain; charset="UTF-8" On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn wrote: > On Wed, 30 Jun 2021 06:02:59 +0100 > Graham Perrin wrote: > > > On 29/06/2021 10:42, Gary Jennejohn wrote: > > > ___ panic is now the result of an unaligned free. > > > > > > panic: Unaligned free of 0xfffff800259e2800 from zone > > > 0xfffffe00dc9d2000(da_ccb) slab 0xfffff800259e2fd8(3) > > > > > > I have the crash dump and a debug kernel in case anyone wants more > info. > > Can you post the backtrace etc. here? Thanks > > > > Sure. As can be seen from the uma zone being da_ccb, the panic > resulted from setting kern.cam.da.enable_uma_ccbs=1. > > Unread portion of the kernel message buffer: > panic: Unaligned free of 0xfffff800259e2800 from zone > 0xfffffe00dc9d2000(da_ccb) slab 0xfffff800259e2fd8(3) > cpuid = 2 > time = 1624958650 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame > 0xfffffe00c62687a0 > kdb_backtrace() at kdb_backtrace+0x46/frame 0xfffffe00c6268850 > vpanic() at vpanic+0x227/frame 0xfffffe00c62688f0 > panic() at panic+0x4e/frame 0xfffffe00c6268950 > uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfffffe00c62689a0 > item_dtor() at item_dtor+0x7c/frame 0xfffffe00c62689e0 > uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfffffe00c6268a50 > uma_zfree() at uma_zfree+0x23/frame 0xfffffe00c6268a70 > xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfffffe00c6268a90 > camperiphdone() at camperiphdone+0x211/frame 0xfffffe00c6268ae0 > xpt_done_process() at xpt_done_process+0x550/frame 0xfffffe00c6268b40 > xpt_done_td() at xpt_done_td+0x1c0/frame 0xfffffe00c6268b80 > fork_exit() at fork_exit+0x117/frame 0xfffffe00c6268bf0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c6268bf0 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic > > doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > 399 dumptid = curthread->td_tid; > (kgdb) bt > #0 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > #1 0xffffffff804d5dd7 in db_dump (dummy=-2138843371, dummy2=false, > dummy3=-1, > dummy4=0xfffffe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575 > #2 0xffffffff804d5bf4 in db_command ( > last_cmdp=0xffffffff8114ce80 , cmd_table=0x0, > dopager=1) > at /usr/src/sys/ddb/db_command.c:482 > #3 0xffffffff804d583c in db_command_loop () > at /usr/src/sys/ddb/db_command.c:535 > #4 0xffffffff804da27c in db_trap (type=3, code=0) > at /usr/src/sys/ddb/db_main.c:270 > #5 0xffffffff8083df9d in kdb_trap (type=3, code=0, tf=0xfffffe00c6268770) > at /usr/src/sys/kern/subr_kdb.c:727 > #6 0xffffffff80d31494 in trap (frame=0xfffffe00c6268770) > at /usr/src/sys/amd64/amd64/trap.c:604 > #7 0xffffffff80d32628 in trap_check (frame=0xfffffe00c6268770) > at /usr/src/sys/amd64/amd64/trap.c:664 > #8 > #9 breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66 > #10 0xffffffff8083d3d0 in kdb_enter (why=0xffffffff80e0355b "panic", > msg=0xffffffff80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505 > #11 0xffffffff807d1725 in vpanic ( > fmt=0xffffffff80dbca46 "Unaligned free of %p from zone %p(%s) slab > %p(%d)", ap=0xfffffe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906 > #12 0xffffffff807d120e in panic ( > fmt=0xffffffff80dbca46 "Unaligned free of %p from zone %p(%s) slab > %p(%d)") > at /usr/src/sys/kern/kern_shutdown.c:843 > #13 0xffffffff80c16a8c in uma_dbg_free (zone=0xfffffe00dc9d2000, > slab=0xfffff800259e2fd8, item=0xfffff800259e2800) > at /usr/src/sys/vm/uma_core.c:5659 > #14 0xffffffff80c0c5dc in item_dtor (zone=0xfffffe00dc9d2000, > item=0xfffff800259e2800, size=544, udata=0x0, skip=SKIP_NONE) > at /usr/src/sys/vm/uma_core.c:3418 > #15 0xffffffff80c0ba60 in uma_zfree_arg (zone=0xfffffe00dc9d2000, > item=0xfffff800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 > #16 0xffffffff802e45d3 in uma_zfree (zone=0xfffffe00dc9d2000, > item=0xfffff800259e2800) at /usr/src/sys/vm/uma.h:404 > #17 0xffffffff802dc3c3 in xpt_free_ccb (free_ccb=0xfffff800259e2800) > at /usr/src/sys/cam/cam_xpt.c:4676 > #18 0xffffffff802dacf1 in camperiphdone (periph=0xfffff80025329b00, > done_ccb=0xfffff80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427 > #19 0xffffffff802e4520 in xpt_done_process (ccb_h=0xfffff80025a24cc0) > at /usr/src/sys/cam/cam_xpt.c:5493 > #20 0xffffffff802e68e0 in xpt_done_td (arg=0xffffffff81143700 ) > at /usr/src/sys/cam/cam_xpt.c:5548 > #21 0xffffffff807673c7 in fork_exit (callout=0xffffffff802e6720 > , > arg=0xffffffff81143700 , frame=0xfffffe00c6268c00) > at /usr/src/sys/kern/kern_fork.c:1083 > #22 > > [kgdb stuff removed] > > (kgdb) down > #15 0xffffffff80c0ba60 in uma_zfree_arg (zone=0xfffffe00dc9d2000, > item=0xfffff800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 > 4374 item_dtor(zone, item, cache_uz_size(cache), udata, > SKIP_NONE); > (kgdb) down > #14 0xffffffff80c0c5dc in item_dtor (zone=0xfffffe00dc9d2000, > item=0xfffff800259e2800, size=544, udata=0x0, skip=SKIP_NONE) > at /usr/src/sys/vm/uma_core.c:3418 > 3418 uma_dbg_free(zone, NULL, item); > (kgdb) p/x skipdbg > $26 = 0x0 > (kgdb) p/x zone->uz_flags > $27 = 0x41000000 (UMA_ZFLAG_TRASH|UMA_ZFLAG_CTORDTOR) > (kgdb) down > #13 0xffffffff80c16a8c in uma_dbg_free (zone=0xfffffe00dc9d2000, > slab=0xfffff800259e2fd8, item=0xfffff800259e2800) > at /usr/src/sys/vm/uma_core.c:5659 > 5659 panic("Unaligned free of %p from zone %p(%s) slab > %p(%d)", > > Note that item is the same as saved_ccb. > > (kgdb) p/x *zone->uz_keg > $28 = {uk_hash = {uh_slab_hash = 0x0, uh_hashsize = 0x0, uh_hashmask = > 0x0}, > uk_zones = {lh_first = 0xfffffe00dc9d2000}, uk_dr = { > dr_policy = 0xffffffff810010e0, dr_iter = 0x0}, uk_align = 0x7, > uk_reserve = 0x0, uk_size = 0x220, uk_rsize = 0x220, > uk_init = 0xffffffff80c17d50, uk_fini = 0xffffffff80c17d80, > uk_allocf = 0xffffffff80d342f0, uk_freef = 0xffffffff80d346a0, > uk_offset = 0x0, uk_kva = 0x0, uk_pgoff = 0xfd8, uk_ppera = 0x1, > uk_ipers = 0x7, uk_flags = 0x0, uk_name = 0xffffffff80debbac, uk_link = { > le_next = 0xfffff80005968000, le_prev = 0xfffff80005968380}, > uk_domain = 0xfffff80005968240} > (kgdb) p/x *slab > $29 = {us_link = {le_next = 0xdeadc0dedeadc0de, le_prev = > 0xdeadc0dedeadc0de}, > us_freecount = 0xc0de, us_flags = 0xad, us_domain = 0xde, us_free = { > __bits = 0xfffff800259e2ff0}} > (kgdb) p/x *0xfffff800259e2ff0 > $30 = 0xdeadc0de > Don't know whether this matters, but slab seems to be unitialized. > (kgdb) p/x freei > $31 = 0x3 > > In any case, saved_ccb has an address which lies outside the range > covered by slab, i.e. freei is bigger than the number of entries in > slab. > > I suspect that the only way to really figure out what's going on is to > run the kernel in kgbd and set lots of breakpoints, > What's happening is this, I think. (1) We send a request. (2) It fails, so we send a start unit BUT we do weird things to copy the CCBs around, and the request from (1) and (2) have different allocations. Error recovery overwrites the original request with a new request after saving it off.... (3) start unit succeeds, we go to free one of the CCBs and it's marked incorrectly, triggering either this panic or the prior one we saw. These actions were fine when there was one allocator, but now that there are two more care must be taken, and that more care hasn't been taken yet, so kern.cam.da.enable_uma_ccbs=1 is unsafe for now and should not be used. kern.cam.ada.enable_uma_ccbs=1 is fine as its error recovery doesn't do these things. Warner --00000000000037d0db05c5fe4fec--