From nobody Sat Jun 21 15:49:13 2025 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4bPf0k02dKz5ygjw for ; Sat, 21 Jun 2025 15:49:26 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mx-01.divo.sbone.de (mx-01.divo.sbone.de [IPv6:2003:a:140a:2200:6:594:fffe:19]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature ECDSA (prime256v1) client-digest SHA256) (Client CN "mx-01.divo.sbone.de", Issuer "E5" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4bPf0j0Pfkz3PHP for ; Sat, 21 Jun 2025 15:49:24 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=zabbadoz.net header.s=20240622 header.b=S0oitjLu; spf=pass (mx1.freebsd.org: domain of bzeeb-lists@lists.zabbadoz.net designates 2003:a:140a:2200:6:594:fffe:19 as permitted sender) smtp.mailfrom=bzeeb-lists@lists.zabbadoz.net; dmarc=pass (policy=none) header.from=zabbadoz.net Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:1025]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by mx-01.divo.sbone.de (Postfix) with ESMTPS id 62B41A64805 for ; Sat, 21 Jun 2025 15:49:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=zabbadoz.net; s=20240622; t=1750520954; bh=8n2XdBqrYSemAqDN/keZks7QrNSW1LbvLWkdz8NHqZg=; h=Date:From:To:Subject; b=S0oitjLurOu0/TpkwJyb9rYU4/FznwDa8ppDrJOdXApfnis1Y0vra/QgIdFzAjs+0 CoIkcZcHnn1BIjUXN9Ah0XWvfZy5T0EP1+0ngkZVWh9/sqZgontiPf6Ir/NHTa4+b2 KYFBKx9Jh0Vsi3Z9V8hoa8cVM4FF4/7cdl8ac5sivuclmjDRJ79zjSyCZt6y8kuxVg Hs+JgsUymw8NqmtU8t/yFv2C/mzxuLGtmWx/BAO/MgG/AWutTstxi22HsMm5V5dRH1 AjP1CKTx4zjnpmYho7/VOGf+qPATnqk9SIjcCv3c/XQZN1I8HEgK0AqUhgNbJYuivY iWeIKDa/2gCTEaCY4byAvDS9K25VPirsF0ngJBlSXXjtAd6zRVbqcbA8UHW4tBhpmD Lk/1FKHpk9hhOeC6KXaeFf4IEJudT5iLqSQO4eMR5samwTgS18TvIgmvPRDegfpYZb Nfv3lpLKtoiiQZgqcJ+6yeXH1DKhisdknAchY9q3UITdY7cnEFEP/+5lHTSJIHB8os XM1Ovpa2KCrx9HLlef19r/wO9efVT+jTCu0N0ruXMGrOrI81l6gsi8AXs4aC3Ow/R5 q8O74T46k997FOt7y0/o5oG1d9kAcT9xylnv3S+cdhL3ug51+xiKKCHokkDbmKO0Aw DZsknGc5Sbcey7nNL1QIk+gA= Received: from content-filter.t4-02.sbone.de (content-filter.t4-02.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:2742]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPS id 565F82D029E0 for ; Sat, 21 Jun 2025 15:49:16 +0000 (UTC) X-Virus-Scanned: amavisd-new at sbone.de Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:4902:0:7404:2:1025]) by content-filter.t4-02.sbone.de (content-filter.t4-02.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:2742]) (amavisd-new, port 10024) with ESMTP id HeIsASPG06QH for ; Sat, 21 Jun 2025 15:49:14 +0000 (UTC) Received: from strong-rtwn0.sbone.de (strong-rtwn0.sbone.de [IPv6:fde9:577b:c1a9:4902:3e64:cfff:fe55:bc80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPSA id ED4192D029D8 for ; Sat, 21 Jun 2025 15:49:13 +0000 (UTC) Date: Sat, 21 Jun 2025 15:49:13 +0000 (UTC) From: "Bjoern A. Zeeb" To: current@freebsd.org Subject: regression: memory issues on main/arm64 over sched/runq changes Message-ID: <43005447-2rq0-6nn2-pnr5-4939s112npr4@yvfgf.mnoonqbm.arg> X-OpenPGP-Key-Id: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=US-ASCII X-Spamd-Result: default: False [-3.97 / 15.00]; NEURAL_HAM_LONG(-1.00)[-0.999]; NEURAL_HAM_SHORT(-0.99)[-0.995]; NEURAL_HAM_MEDIUM(-0.97)[-0.974]; DMARC_POLICY_ALLOW(-0.50)[zabbadoz.net,none]; R_DKIM_ALLOW(-0.20)[zabbadoz.net:s=20240622]; R_SPF_ALLOW(-0.20)[+ip6:2003:a:140a:2200:6:594:fffe:19]; MIME_GOOD(-0.10)[text/plain]; MISSING_XM_UA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:3320, ipnet:2003::/19, country:DE]; RCVD_COUNT_THREE(0.00)[4]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MLMMJ_DEST(0.00)[current@freebsd.org]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[current@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_TLS_LAST(0.00)[]; DKIM_TRACE(0.00)[zabbadoz.net:+] X-Rspamd-Queue-Id: 4bPf0j0Pfkz3PHP X-Spamd-Bar: --- Hi, it's too early for stab-week but ... I had interfave groups ("all") disappear from the interface between inteerface creation and ifconfig prints during rc stage: if7: XXXXXXXXXXXXXXXXXXXXXXXXXXX-BZ if_getgroup:1647: ifgl 0xffffa080011aec90, ifgl_group 0, ifg_group 0 panic: vm_fault failed: 0xffff0000005e19c8 error 1 cpuid = 0 time = 8 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1a0 panic() at panic+0x48 data_abort() at data_abort+0x28c handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0x96000004 strlcpy() at strlcpy+0x20 ifhwioctl() at ifhwioctl+0x998 ifioctl() at ifioctl+0x8bc kern_ioctl() at kern_ioctl+0x2e4 sys_ioctl() at sys_ioctl+0x140 do_el0_sync() at do_el0_sync+0x618 handle_el0_sync() at handle_el0_sync+0x4c --- exception, esr 0x56000000 KDB: enter: panic [ thread pid 635 tid 100249 ] Stopped at kdb_enter+0x48: str xzr, [x19, #2432] I intrumented the kernel and could not find any deletions. It was more strange given the machine has 10 physical interfaces + lo and only for #7 and #8 it happened. I added guards to the struct and that did not reveal any memory corruption. Added a loop right at the end of if_addgroup() to make sure the list was coherent and it was (incl. lo which has two groups). Then I started over-allocating the structs (size * 3) for ifgl and ifg and put the actual value in the middle. That worked and the two guard structs showed no sign of memory corruptions. So the larger allocation apparently helped or changed timing (which the printfs had not). Then I undid the changes and backed out to b93161a7e38d and that works just fine. Went to c29459f901dc which shows the problem and panics again. Reduced it to eebc148f25c3. So it's in the range of: % git log --oneline b93161a7e38d..eebc148f25c3 eebc148f25c3 sched_4bsd: ESTCPULIM(): Allow any value in the timeshare range 51a4ae05abe6 sched_4bsd: Remove RQ_PPQ from ESTCPULIM()'s formula a454ff6b0440 sched_4bsd: Move ESTCPULIM() after its macro dependencies a33225efb4bc sched_ule: Sanitize CPU's use and priority computations, and ticks storage 6792f3411f6d sched_ule: Recover previous nice and anti-starvation behaviors dee257c28d93 sched: Internal priority ranges: Reduce kernel, increase timeshare d710acecc00f runq: Add copyright 055b5b5f850d runq: Restrict to kernel only a2d1c3bc2bb4 epoch_test: Assign different priorities using offset 1 b2a9ee2a72ea runq: Remove userland references to RQ_PPQ in rtprio contexts e3a4b989d7f7 runq: Bump __FreeBSD_version after switching to 256 levels af8de65ef23e runq: Switch to 256 levels fd141584cf89 zfs: spa: ZIO_TASKQ_ISSUE: Use symbolic priority 8ecc41918066 Internal scheduling priorities: Always use symbolic ones baecdea10eb5 sched_ule: Use a single runqueue per CPU fdf31d274769 sched_ule: runq_steal_from(): Suppress first thread special case f4be333bc567 sched_ule: Re-implement stealing on top of runq common-code 9c3f4682bb90 runq: New runq_findq(), common low-level search implementation a31193172cb9 runq: New function runq_is_queue_empty(); Use it in ULE 757bab06fb59 runq: Tidy up and rename runq_setbit() and runq_clrbit() de78657a3aef runq: runq_check(): Re-implement on top of runq_findq() 439dc920f2d8 runq: Revamp runq_find*(), new runq_find_range() 200fc93dace7 runq: Re-order functions more logically 7e2502e3dec9 runq: More macros; Better and more consistent naming 57540a0666f6 runq: Clarity and style pass a11926f2a5f0 runq: API tidy up: 'pri' => 'idx', 'idx' as int, remove runq_remove_idx() 28b54827f5c1 runq: Hide function prototypes under _KERNEL c21c24adde98 runq: More selective includes of to reduce pollution 2fefe2c88b31 runq: Deduce most parameters, remove machine headers I do not know if it's feasible or doable to bi-sect those chanes further? /bz -- Bjoern A. Zeeb r15:7