From nobody Mon Jun 26 08:32:03 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QqLgR2K71z4jsnR for ; Mon, 26 Jun 2023 08:32:19 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic313-21.consmr.mail.gq1.yahoo.com (sonic313-21.consmr.mail.gq1.yahoo.com [98.137.65.84]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4QqLgQ1LXBz4589 for ; Mon, 26 Jun 2023 08:32:18 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=WqOTVZve; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.65.84 as permitted sender) smtp.mailfrom=marklmi@yahoo.com; dmarc=pass (policy=reject) header.from=yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1687768335; bh=OoeAj94wbT/zS0bI3st+o4bYBZqT9a1V976NRb0CHmk=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=WqOTVZvekuO/f4SQBa2+3hg4UlcaRVCGvq9f8mJezv8a6Z+Q6vKuRcDfHYp++MwBM3P0CAdvrQLKMmrhEGuNzoiZx4dV6em/kjALT0aekFth+oFl/Zt1wxxItEaY7OQilt1o4DDCwJByGu1KYjEiLIKZKwZmIQotcqNs7Iw/4mQUhQzvaFR8Pdm0SpiElVfeQCneVHr9HliYH6Ffy+b3MKhuz6Ys5sR9Lft+eMFQ+1K+lnigE2zLnohmeergtikAexHDT4dMpRQQkVg47SForQw5fKlKGsVvAoa34FQEqBJ8yJL1hBcN/FDGg3yVWEnNYe0poOh9OoukxiEpBDhGfA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1687768335; bh=JSMPRw2eP06k342XHLJl8zVCgJoJmauBlIQrEeUUZ+3=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=OI593rr7KE5iovPIrETD+fQGJJqf7eWk96tiFXSmbAID5bHjBwkS3S7dlIC+r254TKD3geF1XKcgJqvzie5fXEODJydcmaMHhyUx7noNmN9eVrgeeQriM9e4MzqQtG21K0CR/IWucGz3uvH+Anr8Hzgm8hFThEZlH2T1UlmB9HX7Usa6R1nln6nd6Gxv9oB7NjZAm22EIsXmTCThHzfhYh78qef0Rj3wfjxOEkmrBp/7wrtlcdTb81NcuG5eM2cu+TSOAGKEK5Yi5D9b+sgKc6u9NA1OnP5KEQV+tFdTKawtdrR2Lx0v2tEZBltCRrVxF4/wPFCGmoD4tyS4bHX0Xw== X-YMail-OSG: K6Sgd.4VM1nBp_RKH6YuKSo2t.JLgeSS04OHV_S7Mt3CUX6CPM_RmTgP_EgyO36 eocoScVC29fFutx02V73WEtU21VgEW7vqWUTXX6qcgUujOkDitMS.fRIj4hFOvPrcYeGY24mNrKE ZSQiEjcpX4Oik0701Lx0Yz_Mms7Jf4a55QwUWisAXh8jvJcOOh_o9oBnvfld6L141zb47f4bshaI aJ4HmvN8_2kKnMp0FFngGjdbTzHTAS2bJiaYQZdKU9j_JpSOsevERixCryoWwLTvbFx88C.OQckz mv0t418fAAP3QzOIwO1pDxzqQkeZH86_IGQKwrb4uEmArbxZhNY1wFDSxnoR4I.rVunGhNoXB72X qo9zs2oZX6TfNjK.aM5LThBr5AOekEvxlB8Q0aTUnKnPsl6F6_oho8RwTwdPf2oi60Kak.G0ohEX odtzpAHezGRY1LdG_LxRKa_MFgwKRNuxDPOSGJzErsxhxWybz72guHotOOmsaDPrEh1x45mEuoK6 Bhng_tCQK6ZviSyhQOskRhPbrpIxPeB6PuAJup6zPuObUt3zrFfGhiYtabwEJjSdyGaDWhjPZcBs 9nsstaNnZjUI6I2fnW1WcaFyPTQ_th4dQ4ixoT8lABOnGL_DJU74bwYnyvKvIgi_5nXJs7V2nWv_ DXk4yVsIGHVk2VmVKG6Gu3_Hooy3rFN1fsH8E9s4JgQGH4hC44cG5x_dLtnfQlmgJfpX_qGgCpOE fnuf.oQmxI31DTRUS7Ag7op2kIom8NKabpa4ODoIN.M4RX.PIw3sZHgWKFjvRHkqhDwCDN0RkXi5 u3O6DWOkd25VRQ3.NSs5POZwtrjGnEhm6nDTxZkVo4k1RvqO6D5mkRe1XpP5IK45shHIhdEt2HQa ON6M1L4.SRg5.E1LGT8rsCLYMAFjnWakRa_ob.4BP97kL8mxo3.jSyJqwKxwhGfdidTqR49.Z.8j Jet2lE9Nvy4pJh_Jcebp8JnGyXyU6JfDvBb2UmwAmoCjS0vcj_8eoP7ImrQSUHN.V9i5e6YFWN2D tMKmEIfybd_s0OELEjVWzXtF79bKICyOeghCSd.WRZ4r1G6Bq1jk0XL0io42x0EEFLr9FVtGNv3H p4xbaFzsu6cXGhEeWCm8EoPu5UNnhpF7ypRDc2httWV8MRy9C7qD7HvaYn60Cp0tko98owPhyxZ4 pQdBhCoiB7SO2.Z0vmfjvgvJBJqn0Nie.HSRBFLJilToLioUTuuKTIP0.nLKh1_ygWc5gs1NW01Y GV6uEP31y.oNR_r92mYzvFUjQTdJOULSmTZZ_h7GEgUxxMFiHcahRG0c6j.jo8_ZoFJG853h0nvZ 74yUCZrQY44IOEJ5K6O2QyDL5pQx1qNMKbuWCyTGcnTp3GiTF6kI0v79H8_huEm60l9Dt0SL6AwZ bsAsgXeC_ksM_YsVy0oXj6v9myYxpC1UkmQ5Lsx2EPW0LpTx41yDnQ2T0IkEqkbbBATUOezosKlh 2AgtFFJFQfOPCLLSmFeGHBt.GxGwRhkFTAJn0L4D6odtNNfRoIU1ST1mCzdIn3IOsFJN1H5eTNLI XvL4VmOjUWhDPIyhQLOmOY6DmsQQiho_mCwZwp9moybm9FDYMiwZpmqZ.ozUcmlmtwyJOY8iX4BJ X3RuubTc.Pa6c_eQv6DT9GkfFdl97y_uxg8YEwQHKhS_ifVjKAf2f0D0_Q86O4ylJyFp9s0jFyvH MeAYjUJ2KC4TPtr4MOTu9wpc8aRE2c71Aq9bufj_bY5.KJ0cZ6UqmHGJx.Nm_KDitMx_JtDm1Vy. rwYeYIV3KP842vmCOpp33Q.TbAwpm2uvlFePnb3iqCHf0HbI3ZFNngyq8cajhktLOanHGd6gErg9 O7J2m8xdsl7zB_UyxE0ElhUEIR_GrmrGIR51IQzHecXmGb7L_ONvaCEL2SPHBAGKyy6WQk0rhey7 C_Bn6l1YTrBDNUwBc_KrqtdCZ761mhKkz3SAWAyBXfRpmUNh_G2fDa3cCEt_ndtlaiYEpWrT06n_ u9uHHQnmNQ2pERG14hJYsmbstdtxLwlfHpxgNHz.fFkax2S5bRwu0ifjf9nG02wOqW4a1z.iBPLz pquj4VJE1fOTmdyExvM_y7Vlq.vxeeaShc15UCMUz_l3MJTpZoeQ26pmE.re3Q0WIo3_m2gT1tTB 1AM6fQlV15uX5_GUW5nHv_ulFbIialsYLgGb5qkeHRXykXcbhZpnZb7Nxttirkb4_ZVAA26_0kHO .mrzOmdtrXI8uEUK4H61a0FDyITzCRbKy.rI9kuKmvpl_CdoMmpOrIZuLPp76ILBWwcYXAP8- X-Sonic-MF: X-Sonic-ID: a4fd8e93-72ab-4c90-8358-0224aaaf109a Received: from sonic.gate.mail.ne1.yahoo.com by sonic313.consmr.mail.gq1.yahoo.com with HTTP; Mon, 26 Jun 2023 08:32:15 +0000 Received: by hermes--production-gq1-6db989bfb-jqsjz (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID dc9e0481b93b31d6754aa9d9bd11fd48; Mon, 26 Jun 2023 08:32:14 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Subject: Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit From: Mark Millard In-Reply-To: <4A380699-7C9E-4E2E-8DCD-F9ECC2112667@yahoo.com> Date: Mon, 26 Jun 2023 01:32:03 -0700 Cc: Current FreeBSD , freebsd-arm Content-Transfer-Encoding: quoted-printable Message-Id: <64F18C76-BD2A-4608-A8CC-38AC2820FC12@yahoo.com> References: <3FD359F8-CFCC-400F-B6DE-B635B747DE7F.ref@yahoo.com> <3FD359F8-CFCC-400F-B6DE-B635B747DE7F@yahoo.com> <4A380699-7C9E-4E2E-8DCD-F9ECC2112667@yahoo.com> To: John F Carr X-Mailer: Apple Mail (2.3731.600.7) X-Spamd-Result: default: False [-3.44 / 15.00]; NEURAL_HAM_SHORT(-0.98)[-0.985]; NEURAL_HAM_LONG(-0.98)[-0.983]; NEURAL_HAM_MEDIUM(-0.98)[-0.976]; MV_CASE(0.50)[]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MIME_GOOD(-0.10)[text/plain]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; ARC_NA(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.65.84:from]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; RCVD_TLS_LAST(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; TO_DN_ALL(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[yahoo.com]; MID_RHS_MATCH_FROM(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.65.84:from] X-Rspamd-Queue-Id: 4QqLgQ1LXBz4589 X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N On Jun 24, 2023, at 17:25, Mark Millard wrote: > On Jun 24, 2023, at 14:26, John F Carr wrote: >=20 >>=20 >>> On Jun 24, 2023, at 13:00, Mark Millard wrote: >>>=20 >>> The running system build is a non-debug build (but >>> with symbols not stripped). >>>=20 >>> The HoneyComb's console log shows: >>>=20 >>> . . . >>> GEOM_STRIPE: Device stripe.IMfBZr destroyed. >>> GEOM_NOP: Device md0.nop created. >>> g_vfs_done():md0.nop[READ(offset=3D5885952, length=3D8192)]error =3D = 5 >>> GEOM_NOP: Device md0.nop removed. >>> GEOM_NOP: Device md0.nop created. >>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D = 5 >>> g_vfs_done():md0.nop[READ(offset=3D5935104, length=3D4096)]error =3D = 5 >>> GEOM_NOP: Device md0.nop removed. >>> GEOM_NOP: Device md0.nop created. >>> GEOM_NOP: Device md0.nop removed. >>> Fatal data abort: >>> x0: ffffa02506e64400 >>> x1: ffff0001ea401880 (g_raid3_post_sync + 3a145f8) >>> x2: 4b >>> x3: a343932b0b22fb30 >>> x4: 0 >>> x5: 3310b0d062d0e1d >>> x6: 1d0e2d060d0b3103 >>> x7: 0 >>> x8: ea325df8 >>> x9: ffff0001eec946d0 ($d.6 + 0) >>> x10: ffff0001ea401880 (g_raid3_post_sync + 3a145f8) >>> x11: 0 >>> x12: 0 >>> x13: ffff000000cd8960 (lock_class_mtx_sleep + 0) >>> x14: 0 >>> x15: ffffa02506e64405 >>> x16: ffff0001eec94860 (_DYNAMIC + 160) >>> x17: ffff00000063a450 (ifc_attach_cloner + 0) >>> x18: ffff0001eb290400 (g_raid3_post_sync + 48a3178) >>> x19: ffff0001eec94600 (vnet_epair_init_vnet_init + 0) >>> x20: ffff000000fa5b68 (vnet_sysinit_sxlock + 18) >>> x21: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>> x22: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>> x23: ffffa0000042e500 >>> x24: ffffa0000042e500 >>> x25: ffff000000ce0788 (linker_lookup_set_desc + 0) >>> x26: ffffa0203cdef780 >>> x27: ffff0001eec94698 (__set_sysinit_set_sym_if_epairmodule_sys_init = + 0) >>> x28: ffff000000d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>> x29: ffff0001eb290430 (g_raid3_post_sync + 48a31a8) >>> sp: ffff0001eb290400 >>> lr: ffff0001eec82a4c ($x.1 + 3c) >>> elr: ffff0001eec82a60 ($x.1 + 50) >>> spsr: 60000045 >>> far: ffff0002d8fba4c8 >>> esr: 96000046 >>> panic: vm_fault failed: ffff0001eec82a60 error 1 >>> cpuid =3D 14 >>> time =3D 1687625470 >>> KDB: stack backtrace: >>> db_trace_self() at db_trace_self >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >>> vpanic() at vpanic+0x13c >>> panic() at panic+0x44 >>> data_abort() at data_abort+0x2fc >>> handle_el1h_sync() at handle_el1h_sync+0x14 >>> --- exception, esr 0x96000046 >>> $x.1() at $x.1+0x50 >>> vnet_register_sysinit() at vnet_register_sysinit+0x114 >>> linker_load_module() at linker_load_module+0xae4 >>> kern_kldload() at kern_kldload+0xfc >>> sys_kldload() at sys_kldload+0x60 >>> do_el0_sync() at do_el0_sync+0x608 >>> handle_el0_sync() at handle_el0_sync+0x44 >>> --- exception, esr 0x56000000 >>> KDB: enter: panic >>> [ thread pid 70419 tid 101003 ] >>> Stopped at kdb_enter+0x44: str xzr, [x19, #3200] >>> db>=20 >>=20 >> The failure appears to be initializing module if_epair. >=20 > Yep: trying: >=20 > # kldload if_epair.ko >=20 > was enough to cause the crash. (Just a HoneyComb context at > that point.) >=20 > I tried media dd'd from the recent main snapshot, booting the > same system. No crash. I moved my build boot media to some > other systems and tested them: crashes. I tried my boot media > built optimized for Cortex-A53 or Cortex-X1C/Cortex-A78C > instead of Cortex-A72: no crashes. (But only one system can > use the X1C/A78C code in that build.) >=20 > So variation testing only gets the crashes for my builds > that are code-optimized for Cortex-A72's. The same source > tree vintage built for cortex-53 or Cortex-X1C/Cortex-A78C > optimization does not get the crashes. But I also > demonstrated an optmized for Cortex-A72 build from 2023-Mar > that gets the crash. >=20 > The last time I ran into one of these "crashes tied to > cortex-a72 code optimization" examples it turned out to be > some missing memory-model management code in FreeBSD's USB > code. But being lucky enough to help identify a FreeBSD > source code problem again seems not that likely. It could > easily be a code generation error by clang for all I know. >=20 > So, unless at some point I produce fairly solid evidence > that the code actually running is messed up by FreeBSD > source code, this should likely be treated as "blame the > operator" and should likely be largely ignored as things > are. (Just My Problem, as I want the Cortex-A72 optimized > builds.) Turns out that the source code in question is the assignment to V_epair_cloner below: static void vnet_epair_init(const void *unused __unused) { struct if_clone_addreq req =3D { .match_f =3D epair_clone_match, .create_f =3D epair_clone_create, .destroy_f =3D epair_clone_destroy, }; V_epair_cloner =3D ifc_attach_cloner(epairname, &req); } VNET_SYSINIT(vnet_epair_init, SI_SUB_PSEUDO, SI_ORDER_ANY, vnet_epair_init, NULL); Example code when not optimizing for the Cortex-A72: 11a4c: d0000089 adrp x9, 0x23000 11a50: f9400248 ldr x8, [x18] 11a54: f942c508 ldr x8, [x8, #1416] 11a58: f943d929 ldr x9, [x9, #1968] 11a5c: a9437bfd ldp x29, x30, [sp, #48] 11a60: f9401508 ldr x8, [x8, #40] 11a64: f8296900 str x0, [x8, x9] The code when optmizing for the Cortex-A72: 11a4c: f9400248 ldr x8, [x18] 11a50: f942c508 ldr x8, [x8, #1416] 11a54: d503201f nop 11a58: 1008e3c9 adr x9, #72824 11a5c: f9401508 ldr x8, [x8, #40] 11a60: f8296900 str x0, [x8, x9] 11a64: a9437bfd ldp x29, x30, [sp, #48] It is the "str x0, [x8, x9]" that vm_fault's for the optimized code. So: 11a4c: d0000089 adrp x9, 0x23000 11a58: f943d929 ldr x9, [x9, #1968] was optimized via replacement by: 11a58: 1008e3c9 adr x9, #72824 I.e., the optimization is based on the offset from the instruction being fixed in order to produce the value in x9, even if the instruction is relocated. This resulted in the specific x9 value shown in the x8/x9 pair: x8: ea325df8 x9: ffff0001eec946d0 which total's to the fault address (value in far): far: ffff0002d8fba4c8 > Sorry for the noise. >=20 >> I see no recent changes in that module that would be likely to break = initialization. >>=20 >> a9bfd080d09a if_epair: do not transmit packets that exceed the = interface MTU >> 4d846d260e2b spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, = drop -FreeBSD >> a6b55ee6be15 net: replace IFF_KNOWSEPOCH with IFF_NEEDSEPOCH >> c69ae8419734 if_epair: also remove vlan metadata from mbufs >> 29c9b1673305 epair: Remove unneeded includes and sort some of the = rest >=20 > My kyua run examples included a Cortex-A72 optimized system build > from last 2023-Mar. It also crashes. It looks like my last kyua > runs were back in 2022-Jan or so, associated with some ASAN and > UBSAN experiments --and so would have been on amd64, not aarch64. > Otherwise any aarch64 ones would be even older. I've no useful > narrowing of the potential time frame for the problem starting. =3D=3D=3D Mark Millard marklmi at yahoo.com