From nobody Thu Jul 1 11:30:26 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id D3D7311D441C for ; Thu, 1 Jul 2021 11:30:32 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: from mail-ej1-x62a.google.com (mail-ej1-x62a.google.com [IPv6:2a00:1450:4864:20::62a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GFwxh0FP3z3Nn8 for ; Thu, 1 Jul 2021 11:30:31 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: by mail-ej1-x62a.google.com with SMTP id v20so9785062eji.10 for ; Thu, 01 Jul 2021 04:30:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=Qin+iXIpKrYd88K8lfmDLMot5VcqqbZKK9CbINmmObg=; b=AsCU87sTUnLw6k+ixFQjw7mTHL+dltvVrodMB4aicacrDpbc0QZnhNAP8kDY4nh3QI iZObNJ4bDHODikPZhmPbW4ShXtGeNHAzmtOG3kRt/IqdhBdNnrhepKSzfMiHR+pWyJC8 ET1F5rmZj0iSi+xgWQNojDdukBdk/jCLRYC8etZdc5bY/nGp5ZiWMy5dYCKUNgVGh6iK x+k5E5/EUSKf3winI6mKZI43vnCCeVeMz1i+wcjznT3jAfVjYLrggxohZMUOZL7ww74X okAKcGE28tJ9/DBvAogLi0Wmv7sgeabPz8wSnqogoeD8ui53gsTqxbvYGdZk2Qn8hN3X 0SNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=Qin+iXIpKrYd88K8lfmDLMot5VcqqbZKK9CbINmmObg=; b=n+7dizde222ejt7aAOoCbfawYBqDPB6WnHbN0TygJOSRW1Y3GdhJp6mkenpvR/aHxv mxnDoqMd5+uXBcaQWutrCJt0zP7dBIS3RKL6051zSmTvi/CNqVgC2Rd3yEA8vMVSzoaN s114q9/uj7+fZmDjSw7yMXopvUyKa07fTCxyijTEivu0rtZAHvv62WROJ+Ebnr67vAmv QQ1epF795/YcCpMpLPau/QHigoI9/Qu4ai3tNF1J5Vwys1W/+y0/F0fInz9DYFSLaSAF eK96YeCEvSTVBOT35svbI8ig28+Jo6QoWft52Ks4UH6cbkMIZWc1N5iU8Z8kBujtBfwa lZ/Q== X-Gm-Message-State: AOAM530pYfnMkipJXo0HJuCNBY+K4WZ9JwA0W2Uncu+Ih6W9HLmnyKZk mOOHM/xj98yIVnfiFvbbE3VnodNcg/M= X-Google-Smtp-Source: ABdhPJyc065WxyewB59ax6iWLC/8+/3LYDePV2VS3bqyApGhlb7yL0XuNWIEWuQuJxJ/LDm4zxueKQ== X-Received: by 2002:a17:906:9be5:: with SMTP id de37mr18715449ejc.549.1625139030002; Thu, 01 Jul 2021 04:30:30 -0700 (PDT) Received: from ernst.home (pd9e2360f.dip0.t-ipconnect.de. [217.226.54.15]) by smtp.gmail.com with ESMTPSA id bx11sm10631040ejc.122.2021.07.01.04.30.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Jul 2021 04:30:29 -0700 (PDT) Date: Thu, 1 Jul 2021 13:30:26 +0200 From: Gary Jennejohn To: Warner Losh Cc: FreeBSD Current Subject: Re: panic: Unaligned free (was: kernel panic while copying files) Message-ID: <20210701113026.59f864e9@ernst.home> In-Reply-To: <20210701035800.410d2376@ernst.home> References: <20210607090109.08ecb130@ernst.home> <20210608084646.6a7e1bc7@ernst.home> <20210608155405.5cf0e200@ernst.home> <20210610095041.38d7597c@ernst.home> <20210629094201.77ef5f22@ernst.home> <20210630125703.2b5544e7@ernst.home> <20210701035800.410d2376@ernst.home> Reply-To: gljennjohn@gmail.com X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; amd64-portbld-freebsd14.0) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4GFwxh0FP3z3Nn8 X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=AsCU87sT; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of gljennjohn@gmail.com designates 2a00:1450:4864:20::62a as permitted sender) smtp.mailfrom=gljennjohn@gmail.com X-Spamd-Result: default: False [-1.45 / 15.00]; HAS_REPLYTO(0.00)[gljennjohn@gmail.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-0.93)[-0.926]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RBL_DBL_DONT_QUERY_IPS(0.00)[2a00:1450:4864:20::62a:from]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; RECEIVED_SPAMHAUS_PBL(0.00)[217.226.54.15:received]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-0.42)[-0.419]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_REPLYTO(0.00)[gmail.com]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; NEURAL_SPAM_MEDIUM(0.90)[0.896]; SPAMHAUS_ZRD(0.00)[2a00:1450:4864:20::62a:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::62a:from]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-current] X-ThisMailContainsUnwantedMimeParts: N Gary Jennejohn wrote: > On Wed, 30 Jun 2021 10:35:14 -0600 > Warner Losh wrote: > > > On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn wrote: > > > > > On Wed, 30 Jun 2021 06:02:59 +0100 > > > Graham Perrin wrote: > > > > > > > On 29/06/2021 10:42, Gary Jennejohn wrote: > > > > > ___ panic is now the result of an unaligned free. > > > > > > > > > > panic: Unaligned free of 0xfffff800259e2800 from zone > > > > > 0xfffffe00dc9d2000(da_ccb) slab 0xfffff800259e2fd8(3) > > > > > > > > > > I have the crash dump and a debug kernel in case anyone wants more > > > info. > > > > Can you post the backtrace etc. here? Thanks > > > > > > > > > > Sure. As can be seen from the uma zone being da_ccb, the panic > > > resulted from setting kern.cam.da.enable_uma_ccbs=1. > > > > > > Unread portion of the kernel message buffer: > > > panic: Unaligned free of 0xfffff800259e2800 from zone > > > 0xfffffe00dc9d2000(da_ccb) slab 0xfffff800259e2fd8(3) > > > cpuid = 2 > > > time = 1624958650 > > > KDB: stack backtrace: > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame > > > 0xfffffe00c62687a0 > > > kdb_backtrace() at kdb_backtrace+0x46/frame 0xfffffe00c6268850 > > > vpanic() at vpanic+0x227/frame 0xfffffe00c62688f0 > > > panic() at panic+0x4e/frame 0xfffffe00c6268950 > > > uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfffffe00c62689a0 > > > item_dtor() at item_dtor+0x7c/frame 0xfffffe00c62689e0 > > > uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfffffe00c6268a50 > > > uma_zfree() at uma_zfree+0x23/frame 0xfffffe00c6268a70 > > > xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfffffe00c6268a90 > > > camperiphdone() at camperiphdone+0x211/frame 0xfffffe00c6268ae0 > > > xpt_done_process() at xpt_done_process+0x550/frame 0xfffffe00c6268b40 > > > xpt_done_td() at xpt_done_td+0x1c0/frame 0xfffffe00c6268b80 > > > fork_exit() at fork_exit+0x117/frame 0xfffffe00c6268bf0 > > > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c6268bf0 > > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > > KDB: enter: panic > > > > > > doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > > > 399 dumptid = curthread->td_tid; > > > (kgdb) bt > > > #0 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > > > #1 0xffffffff804d5dd7 in db_dump (dummy=-2138843371, dummy2=false, > > > dummy3=-1, > > > dummy4=0xfffffe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575 > > > #2 0xffffffff804d5bf4 in db_command ( > > > last_cmdp=0xffffffff8114ce80 , cmd_table=0x0, > > > dopager=1) > > > at /usr/src/sys/ddb/db_command.c:482 > > > #3 0xffffffff804d583c in db_command_loop () > > > at /usr/src/sys/ddb/db_command.c:535 > > > #4 0xffffffff804da27c in db_trap (type=3, code=0) > > > at /usr/src/sys/ddb/db_main.c:270 > > > #5 0xffffffff8083df9d in kdb_trap (type=3, code=0, tf=0xfffffe00c6268770) > > > at /usr/src/sys/kern/subr_kdb.c:727 > > > #6 0xffffffff80d31494 in trap (frame=0xfffffe00c6268770) > > > at /usr/src/sys/amd64/amd64/trap.c:604 > > > #7 0xffffffff80d32628 in trap_check (frame=0xfffffe00c6268770) > > > at /usr/src/sys/amd64/amd64/trap.c:664 > > > #8 > > > #9 breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66 > > > #10 0xffffffff8083d3d0 in kdb_enter (why=0xffffffff80e0355b "panic", > > > msg=0xffffffff80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505 > > > #11 0xffffffff807d1725 in vpanic ( > > > fmt=0xffffffff80dbca46 "Unaligned free of %p from zone %p(%s) slab > > > %p(%d)", ap=0xfffffe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906 > > > #12 0xffffffff807d120e in panic ( > > > fmt=0xffffffff80dbca46 "Unaligned free of %p from zone %p(%s) slab > > > %p(%d)") > > > at /usr/src/sys/kern/kern_shutdown.c:843 > > > #13 0xffffffff80c16a8c in uma_dbg_free (zone=0xfffffe00dc9d2000, > > > slab=0xfffff800259e2fd8, item=0xfffff800259e2800) > > > at /usr/src/sys/vm/uma_core.c:5659 > > > #14 0xffffffff80c0c5dc in item_dtor (zone=0xfffffe00dc9d2000, > > > item=0xfffff800259e2800, size=544, udata=0x0, skip=SKIP_NONE) > > > at /usr/src/sys/vm/uma_core.c:3418 > > > #15 0xffffffff80c0ba60 in uma_zfree_arg (zone=0xfffffe00dc9d2000, > > > item=0xfffff800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 > > > #16 0xffffffff802e45d3 in uma_zfree (zone=0xfffffe00dc9d2000, > > > item=0xfffff800259e2800) at /usr/src/sys/vm/uma.h:404 > > > #17 0xffffffff802dc3c3 in xpt_free_ccb (free_ccb=0xfffff800259e2800) > > > at /usr/src/sys/cam/cam_xpt.c:4676 > > > #18 0xffffffff802dacf1 in camperiphdone (periph=0xfffff80025329b00, > > > done_ccb=0xfffff80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427 > > > #19 0xffffffff802e4520 in xpt_done_process (ccb_h=0xfffff80025a24cc0) > > > at /usr/src/sys/cam/cam_xpt.c:5493 > > > #20 0xffffffff802e68e0 in xpt_done_td (arg=0xffffffff81143700 ) > > > at /usr/src/sys/cam/cam_xpt.c:5548 > > > #21 0xffffffff807673c7 in fork_exit (callout=0xffffffff802e6720 > > > , > > > arg=0xffffffff81143700 , frame=0xfffffe00c6268c00) > > > at /usr/src/sys/kern/kern_fork.c:1083 > > > #22 > > > > > > [kgdb stuff removed] > > > > > > (kgdb) down > > > #15 0xffffffff80c0ba60 in uma_zfree_arg (zone=0xfffffe00dc9d2000, > > > item=0xfffff800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 > > > 4374 item_dtor(zone, item, cache_uz_size(cache), udata, > > > SKIP_NONE); > > > (kgdb) down > > > #14 0xffffffff80c0c5dc in item_dtor (zone=0xfffffe00dc9d2000, > > > item=0xfffff800259e2800, size=544, udata=0x0, skip=SKIP_NONE) > > > at /usr/src/sys/vm/uma_core.c:3418 > > > 3418 uma_dbg_free(zone, NULL, item); > > > (kgdb) p/x skipdbg > > > $26 = 0x0 > > > (kgdb) p/x zone->uz_flags > > > $27 = 0x41000000 (UMA_ZFLAG_TRASH|UMA_ZFLAG_CTORDTOR) > > > (kgdb) down > > > #13 0xffffffff80c16a8c in uma_dbg_free (zone=0xfffffe00dc9d2000, > > > slab=0xfffff800259e2fd8, item=0xfffff800259e2800) > > > at /usr/src/sys/vm/uma_core.c:5659 > > > 5659 panic("Unaligned free of %p from zone %p(%s) slab > > > %p(%d)", > > > > > > Note that item is the same as saved_ccb. > > > > > > (kgdb) p/x *zone->uz_keg > > > $28 = {uk_hash = {uh_slab_hash = 0x0, uh_hashsize = 0x0, uh_hashmask = > > > 0x0}, > > > uk_zones = {lh_first = 0xfffffe00dc9d2000}, uk_dr = { > > > dr_policy = 0xffffffff810010e0, dr_iter = 0x0}, uk_align = 0x7, > > > uk_reserve = 0x0, uk_size = 0x220, uk_rsize = 0x220, > > > uk_init = 0xffffffff80c17d50, uk_fini = 0xffffffff80c17d80, > > > uk_allocf = 0xffffffff80d342f0, uk_freef = 0xffffffff80d346a0, > > > uk_offset = 0x0, uk_kva = 0x0, uk_pgoff = 0xfd8, uk_ppera = 0x1, > > > uk_ipers = 0x7, uk_flags = 0x0, uk_name = 0xffffffff80debbac, uk_link = { > > > le_next = 0xfffff80005968000, le_prev = 0xfffff80005968380}, > > > uk_domain = 0xfffff80005968240} > > > (kgdb) p/x *slab > > > $29 = {us_link = {le_next = 0xdeadc0dedeadc0de, le_prev = > > > 0xdeadc0dedeadc0de}, > > > us_freecount = 0xc0de, us_flags = 0xad, us_domain = 0xde, us_free = { > > > __bits = 0xfffff800259e2ff0}} > > > (kgdb) p/x *0xfffff800259e2ff0 > > > $30 = 0xdeadc0de > > > Don't know whether this matters, but slab seems to be unitialized. > > > (kgdb) p/x freei > > > $31 = 0x3 > > > > > > In any case, saved_ccb has an address which lies outside the range > > > covered by slab, i.e. freei is bigger than the number of entries in > > > slab. > > > > > > I suspect that the only way to really figure out what's going on is to > > > run the kernel in kgbd and set lots of breakpoints, > > > > > > > What's happening is this, I think. > > > > (1) We send a request. > > (2) It fails, so we send a start unit > > BUT we do weird things to copy the CCBs around, and the request from (1) > > and (2) have different allocations. Error recovery overwrites the original > > request with a new request after saving it off.... > > (3) start unit succeeds, we go to free one of the CCBs and it's marked > > incorrectly, triggering either this panic or the prior one we saw. > > > > These actions were fine when there was one allocator, but now that there > > are two more care must be taken, and that more care hasn't been taken yet, > > so kern.cam.da.enable_uma_ccbs=1 is unsafe for now and should not be used. > > kern.cam.ada.enable_uma_ccbs=1 is fine as its error recovery doesn't do > > these things. > > > > I noticed that the value of vm.debug.divisor affects what value is > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few > different values. > > The returned value is used to set skipdbg in uma_core.c:item_dtor(). > > The default is vm.debug.divisor=1. > > vm.debug.divisor is only present when INVARIANTS is defined. > > kskipdbg eventually affects the value of freei. > > With these values: > vm.debug.divisor: 0 > kern.cam.da.enable_uma_ccbs: 1 > I can turn on the disk and it comes up without a panic! > > However, I didn't try to do any large data transfers to the disk. > > So, it appears that at least vm.debug.divisor is a big factor in > whether or not a panic happens with INVARIANTS. > I decided to do a real test. So I built a kernel w/o INVARIANTS and installed it to /boot/test. Then I stuck a 160GB disk I had around into an external USB3 enclosure and put a filesystem on it. The I booted the new kernel from /boot/test and set the sysctls so: kern.cam.da.enable_uma_ccbs: 1 kern.cam.ada.enable_uma_ccbs: 1 After that I plugged in the external USB3 enclosure and copied about 114GiB of data from an internal SSD to it - without a kernel panic: Filesystem Size Used Avail Capacity Mounted on /dev/da0p1 144G 114G 18G 86% /mnt I'm pretty sure that's more than I could copy without a kernel panic prior to the recent changes made in cam and umass. My test may not be real proof that all bugs have been squashed, but it certainly seems to be a better situation than we had before. -- Gary Jennejohn