From nobody Mon Apr 11 08:47:50 2022 X-Original-To: freebsd-xen@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 919541A8A86F for ; Mon, 11 Apr 2022 08:47:52 +0000 (UTC) (envelope-from zedupsys@gmail.com) Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KcMtv6kNzz3QY1 for ; Mon, 11 Apr 2022 08:47:51 +0000 (UTC) (envelope-from zedupsys@gmail.com) Received: by mail-lj1-x22f.google.com with SMTP id s13so19199876ljd.5 for ; Mon, 11 Apr 2022 01:47:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=reply-to:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=0V04Nn7wi/w3jXd6dbY9dqnH636OKc5tAxmCiR3gkNE=; b=CPXzP3rDMbGhh8K3MQvRLcqKrYRhxfvLl9vnY45otBgqx2yjnad9rm9LhBlRqdI2L3 lXXns9VY6Kg/cfHr2wwCF+Jp4wEq8Y7RzL3PteoC522zDr1nTOUcnVkYQoi0lX3GznUo ESS0jq/0/AnbfubG1UeGSrV+jHMsMU3mta5sAZmc5X41D2mzAE39u0vDuEQ7nerSv1Ir jYeiKeJvP5UzdfywNr1l5X6XCCkv9T19NT11jYXfWaaK226SLkUhfqK7Ezdc3kTNe38E 4wJyl632jI31cjwZE+61rKdiBy9yxbdrQuYyc+UdN1JTPy0dd8vy3cd862Gs45AHYfRI vNIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:subject:to:cc:references:from :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=0V04Nn7wi/w3jXd6dbY9dqnH636OKc5tAxmCiR3gkNE=; b=uTtpwwuq1P1duzi7KA7eFdxBAkFj0lih6JcLelvxtnPqCbIKWobDG0GnrFndLLB+Rr 2PRky2qrE8jmvqYdejMb3t99LJOPwyOW5M5z4xgRNOb131IXgxfP4+mT73DOZF+6yInk 0ixjv/NKREQCkad/hHaVMzCq7sKUM4u7b5u75r5apZS6Gckr9mD1siYP5GpGPpEecfZP 73mQmu4L9UBDnvXCV2hLpi9hWE+a4P9wqK3cNVQ/2oZs023Az/1GYpK6XD3qjqe11fpU C3cYgvATaRF7umL3E4AV2d2I+E7W2weju4dp15yNKuOfS5gEKqfYhNCtH6fk3yZ9/Dni 7wOQ== X-Gm-Message-State: AOAM533iv1039dJ0BA8152nDhlEyXv2uJuGwLCtP0xamqHhQ8rA0FvR7 m/2EJ8x8kEN8CrlqYxT/fYE= X-Google-Smtp-Source: ABdhPJxMYO/B30MHnxDt96v89SiLsCs+mYVGLxTW0MyLSnWhxAqelXlxao60zs2wKy48ut/DlMU+yg== X-Received: by 2002:a05:651c:146:b0:24b:4782:e6e9 with SMTP id c6-20020a05651c014600b0024b4782e6e9mr12116604ljd.224.1649666869912; Mon, 11 Apr 2022 01:47:49 -0700 (PDT) Received: from [10.3.0.1] ([213.110.65.3]) by smtp.googlemail.com with ESMTPSA id x1-20020a2ea7c1000000b0024b6155de11sm355592ljp.122.2022.04.11.01.47.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 11 Apr 2022 01:47:49 -0700 (PDT) Reply-To: zedupsys@gmail.com Subject: Re: ZFS + FreeBSD XEN dom0 panic To: =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= Cc: freebsd-xen@freebsd.org, buhrow@nfbcal.org References: <639f7ce0-8a07-884c-c1cf-8257b9f3d9e8@gmail.com> <4da2302b-0745-ea1d-c868-5a8a5fc66b18@gmail.com> <48b74c39-abb3-0a3e-91a8-b5ab1e1223ce@gmail.com> <22643831-70d3-5a3e-f973-fb80957e80dc@gmail.com> <209c9b7c-4b4b-7fe3-6e73-d2a0dc651c19@gmail.com> From: Ze Dupsys Message-ID: <1286cb59-867e-e7d0-2bd3-45c33feae66a@gmail.com> Date: Mon, 11 Apr 2022 11:47:50 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-xen List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-xen@freebsd.org X-BeenThere: freebsd-xen@freebsd.org MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4KcMtv6kNzz3QY1 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=CPXzP3rD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of zedupsys@gmail.com designates 2a00:1450:4864:20::22f as permitted sender) smtp.mailfrom=zedupsys@gmail.com X-Spamd-Result: default: False [-3.85 / 15.00]; HAS_REPLYTO(0.00)[zedupsys@gmail.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-1.00)[-0.999]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.86)[-0.856]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-0.998]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_REPLYTO(0.00)[gmail.com]; PREVIOUSLY_DELIVERED(0.00)[freebsd-xen@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::22f:from]; MLMMJ_DEST(0.00)[freebsd-xen]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On 2022.04.08. 18:02, Roger Pau Monné wrote: > On Fri, Apr 08, 2022 at 10:45:12AM +0300, Ze Dupsys wrote: >> On 2022.04.05. 18:22, Roger Pau Monné wrote: >>> .. Thanks, sorry for the late reply, somehow the message slip. >>> >>> I've been able to get the file:line for those, and the trace is kind >>> of weird, I'm not sure I know what's going on TBH. It seems to me the >>> backend instance got freed while being in the process of connecting. >>> >>> I've made some changes, that might mitigate this, but having not a >>> clear understanding of what's going on makes this harder. >>> >>> I've pushed the changes to: >>> >>> http://xenbits.xen.org/gitweb/?p=people/royger/freebsd.git;a=shortlog;h=refs/heads/for-leak >>> >>> (This is on top of main branch). >>> >>> I'm also attaching the two patches on this email. >>> >>> Let me know if those make a difference to stabilize the system. >> >> Hi, >> >> Yes, it stabilizes the system, but there is still a memleak somewhere, i >> think. >> >> System could run tests for approximately 41 hour, did not panic, but started >> to OOM kill everything. >> >> I did not know how to git clone given commit, thus i just applied patches to >> 13.0-RELEASE sources. >> >> Serial logs have nothing unusual, just that at some point OOM kill starts. > > Well, I think that's good^W better than before. Thanks again for all > the testing. > > It might be helpful now to start dumping `vmstat -m` periodically > while running the stress tests. As there are (hopefully) no more > panics now vmstat might report us what subsystem is hogging the > memory. It's possible it's blkback (again). > > Thanks, Roger. > Yes, it certainly is better. Applied patch on my pre-production server, have not had any panic since then, still testing though. On my stressed lab server, it's a bit different story. On occasion i see a panic with this trace on serial (can not reliably repeat, but sometimes upon starting dom id 1 and 2, sometimes mid-stress-test, dom id > 95). panic: pmap_growkernel: no memory to grow kernel cpuid = 2 time = 1649485133 KDB: stack backtrace: #0 0xffffffff80c57385 at kdb_backtrace+0x65 #1 0xffffffff80c09d61 at vpanic+0x181 #2 0xffffffff80c09bd3 at panic+0x43 #3 0xffffffff81073eed at pmap_growkernel+0x27d #4 0xffffffff80f2d918 at vm_map_insert+0x248 #5 0xffffffff80f30079 at vm_map_find+0x549 #6 0xffffffff80f2bda6 at kmem_init+0x226 #7 0xffffffff80c731a1 at vmem_xalloc+0xcb1 #8 0xffffffff80c72a9b at vmem_xalloc+0x5ab #9 0xffffffff80c724a6 at vmem_alloc+0x46 #10 0xffffffff80f2ac6b at kva_alloc+0x2b #11 0xffffffff8107f0eb at pmap_mapdev_attr+0x27b #12 0xffffffff810588ca at nexus_add_irq+0x65a #13 0xffffffff81058710 at nexus_add_irq+0x4a0 #14 0xffffffff810585b9 at nexus_add_irq+0x349 #15 0xffffffff80c495c1 at bus_alloc_resource+0xa1 #16 0xffffffff8105e940 at xenmem_free+0x1a0 #17 0xffffffff80a7e0dd at xbd_instance_create+0x943d | sed -Ee 's/^#[0-9]* //' -e 's/ .*//' | xargs addr2line -e /usr/lib/debug/boot/kernel/kernel.debug /usr/src/sys/kern/subr_kdb.c:443 /usr/src/sys/kern/kern_shutdown.c:0 /usr/src/sys/kern/kern_shutdown.c:843 /usr/src/sys/amd64/amd64/pmap.c:0 /usr/src/sys/vm/vm_map.c:0 /usr/src/sys/vm/vm_map.c:0 /usr/src/sys/vm/vm_kern.c:712 /usr/src/sys/kern/subr_vmem.c:928 /usr/src/sys/kern/subr_vmem.c:0 /usr/src/sys/kern/subr_vmem.c:1350 /usr/src/sys/vm/vm_kern.c:150 /usr/src/sys/amd64/amd64/pmap.c:0 /usr/src/sys/x86/x86/nexus.c:0 /usr/src/sys/x86/x86/nexus.c:449 /usr/src/sys/x86/x86/nexus.c:412 /usr/src/sys/kern/subr_bus.c:4620 /usr/src/sys/x86/xen/xenpv.c:123 /usr/src/sys/dev/xen/blkback/blkback.c:3010 With gdb backtrace i think i can get a better trace though: #0 __curthread at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump at /usr/src/sys/kern/kern_shutdown.c:399 #2 kern_reboot at /usr/src/sys/kern/kern_shutdown.c:486 #3 vpanic at /usr/src/sys/kern/kern_shutdown.c:919 #4 panic at /usr/src/sys/kern/kern_shutdown.c:843 #5 pmap_growkernel at /usr/src/sys/amd64/amd64/pmap.c:208 #6 vm_map_insert at /usr/src/sys/vm/vm_map.c:1752 #7 vm_map_find at /usr/src/sys/vm/vm_map.c:2259 #8 kva_import at /usr/src/sys/vm/vm_kern.c:712 #9 vmem_import at /usr/src/sys/kern/subr_vmem.c:928 #10 vmem_try_fetch at /usr/src/sys/kern/subr_vmem.c:1049 #11 vmem_xalloc at /usr/src/sys/kern/subr_vmem.c:1449 #12 vmem_alloc at /usr/src/sys/kern/subr_vmem.c:1350 #13 kva_alloc at /usr/src/sys/vm/vm_kern.c:150 #14 pmap_mapdev_internal at /usr/src/sys/amd64/amd64/pmap.c:8974 #15 pmap_mapdev_attr at /usr/src/sys/amd64/amd64/pmap.c:8990 #16 nexus_map_resource at /usr/src/sys/x86/x86/nexus.c:523 #17 nexus_activate_resource at /usr/src/sys/x86/x86/nexus.c:448 #18 nexus_alloc_resource at /usr/src/sys/x86/x86/nexus.c:412 #19 BUS_ALLOC_RESOURCE at ./bus_if.h:321 #20 bus_alloc_resource at /usr/src/sys/kern/subr_bus.c:4617 #21 xenpv_alloc_physmem at /usr/src/sys/x86/xen/xenpv.c:121 #22 xbb_alloc_communication_mem at /usr/src/sys/dev/xen/blkback/blkback.c:3010 #23 xbb_connect at /usr/src/sys/dev/xen/blkback/blkback.c:3336 #24 xenbusb_back_otherend_changed at /usr/src/sys/xen/xenbus/xenbusb_back.c:228 #25 xenwatch_thread at /usr/src/sys/dev/xen/xenstore/xenstore.c:1003 #26 in fork_exit at /usr/src/sys/kern/kern_fork.c:1069 #27 There is some sort of mismatch in info, because panic message printed "panic: pmap_growkernel: no memory to grow kernel", but gdb backtrace in #5 0xffffffff81073eed in pmap_growkernel at /usr/src/sys/amd64/amd64/pmap.c:208 leads to lines: switch (pmap->pm_type) { .. panic("pmap_valid_bit: invalid pm_type %d", pmap->pm_type) So either trace is off the mark or message in serial logs. If this was only memleak related, then it should not happen when dom id 1 is started, i suppose. I am still gathering more info regarding memleak case, will inform when available. Thanks.