From nobody Fri Jul 16 15:53:14 2021 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 04058127EF39 for ; Fri, 16 Jul 2021 15:53:19 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from fry.fubar.geek.nz (fry.fubar.geek.nz [139.59.165.16]) by mx1.freebsd.org (Postfix) with ESMTP id 4GRG3y2Fj1z4fqP; Fri, 16 Jul 2021 15:53:17 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from [192.168.1.66] (67.red-83-54-23.dynamicip.rima-tde.net [83.54.23.67]) by fry.fubar.geek.nz (Postfix) with ESMTPSA id 35F274E76C; Fri, 16 Jul 2021 15:53:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fubar.geek.nz; s=mail; t=1626450796; bh=WQ84TjCkkcARDgNVC0cFmH+flE/Dxrua9CB16RbQlCQ=; h=From:Subject:Date:In-Reply-To:Cc:To:References; b=GEhx+++eesJeVaExrWqQyIgu4tE5kj/zf2NQQn6fT4XglFTp2YidVWobOeJgMV0w4 eDoVpwyLjzu0NzYFiJ/AOJ1kJm94vGb7MtXvscxlQUs+749bWBaOr9bP1S/LeMozUg VAUT1XVUmILEzG28qr6sj/j+shPg+iIBFPzXDakRtqSZ4Q6HevZlUjFX0cOEsu0E1n 2ONMYg0a1NLM++pXAOv+djhZoCE6BFW/Oc8mMwm/afnNizYwCN0bL5XKgjMOxuZ4FF VgWhwaINgZ4EFg2D+0KReIrqpsh6IqSreIBe/L/HkfptVsfoxiXIMP9Rvk+L0gcWj3 MaBqCICNB6ePAybg+qZB/cS217k3L8vX8Zky3W+KtXLdSvm80veQnBoavObztMwnxP TprO078yfEWVHKguw0vmayfGPavdI/qkf8jAfzqrFSQLV7g8vBFoFpUr7Rf4m3R8fh jgXTi4kkBm5tzhOn3fzqxJoa532EZIbHu3PtOGuoVUuC/wWZppqJZkOORCG/kUxuxh Kj3hMUpU2N970Nei00sS6CvuXZoMz9ukDZ6fV1gH3sn2iYcrJ+LXbt95K/mlLFsv/l XLNou6tay+jo0XALuu5rN7Y8aMjl6lxCaCMt1yQ5ptjx7O78FAK+v8KR/zb+QS6W+T Fkz9wy/PsMIkTsKdeOiqm3cs= From: Andrew Turner Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_176D5592-4DF4-4747-A7A9-AD5ED99BB2E6" List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.20\)) Subject: Re: register x18 Date: Fri, 16 Jul 2021 17:53:14 +0200 In-Reply-To: <4361A215-BB47-4166-BC3F-386E7834B788@freebsd.org> Cc: Mark Millard , freebsd-arm@freebsd.org To: Michael Tuexen References: <86EC9C12-F90C-4D0C-BFA3-41986C9F07B5@freebsd.org> <32C24DDC-C8A1-43CD-9220-8009B229E452@freebsd.org> <4361A215-BB47-4166-BC3F-386E7834B788@freebsd.org> X-Mailer: Apple Mail (2.3445.104.20) X-Rspamd-Queue-Id: 4GRG3y2Fj1z4fqP X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-Spam: Yes X-ThisMailContainsUnwantedMimeParts: Y --Apple-Mail=_176D5592-4DF4-4747-A7A9-AD5ED99BB2E6 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 16 Jul 2021, at 17:07, Michael Tuexen wrote: >=20 >> On 16. Jul 2021, at 14:51, Andrew Turner > wrote: >>=20 >>=20 >>> On 16 Jul 2021, at 13:08, tuexen@freebsd.org = wrote: >>>=20 >>>> On 16. Jul 2021, at 04:06, Mark Millard > wrote: >>>>=20 >>>>=20 >>>>=20 >>>> On 2021-Jul-15, at 17:40, Michael Tuexen = wrote: >>>>=20 >>>>> Dear all, >>>>>=20 >>>>> register x18 seems to be special. What is it used for in FreeBSD? >>>>>=20 >>>>> Best regards >>>>> Michael >>>>=20 >>>> = https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64-bit-A= rchitecture/Register-use-in-the-AArch64-Procedure-Call-Standard/Parameters= -in-general-purpose-registers >>>>=20 >>>> reports: >>>>=20 >>>> QUOTE >>>> =E2=80=A2 X18 is the platform register and is reserved for the = use of platform ABIs. This is an adional temporary register on platforms = that don't assign a special meaning to it. >>>> END QUOTE >>>>=20 >>>> So, special, yes. But I do not know what the "platform ABI" usage >>>> for it might be on FreeBSD. So, for the most part, this does not >>>> well-answer your question. Sorry. >>> Yepp, I found the above text. However, x18 seems to be used when = accessing >>> global variables. I am looking at a panic, where the system panics = on accessing >>> global variable, which can be controlled by sysctl. >>> It seems that x18 does not have the expected value, but it is also = not set in >>> the function... >>=20 >> X18 is used to store the pointer to the pcpu data It should only ever = be set when we enter the kernel from userland by the exception handler. > Hi Andrew, >=20 > thanks for the response. Hmm. I was hoping that the answers helps me = to understand > a panic that I'm observing when stress testing the TCP RACK stack. I'm = transferring > 10GB via scp and at some point of time (not right at the beginning), = the machine panics. > The machine is an eMAG system. >=20 > Here is what I know: >=20 > Initially it panics multiple times (always at the same place) in > https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16540 = > when it is trying to read V_tcp_map_entries_limit. >=20 > I discussed this with rrs@ and since we had no clue, I tried to just = compile > out the if condition. >=20 > Then is paniced in > https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16928 = > at > https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n15664 = > which is basically the next place where a V_ variable is accessed. >=20 > Please note that for debugging I'm using a kernel without VIMAGE = support, > since we initially thought that it might be related a VNET bug. >=20 > So I decided to look at the disassembly of rack_sndbuf_autoscale (I = added some comments): >=20 > 0xffff000001388a6c <+0>: stp x29, x30, [sp, #-32]! > 0xffff000001388a70 <+4>: str x19, [sp, #16] > 0xffff000001388a74 <+8>: mov x29, sp > 0xffff000001388a78 <+12>: ldr x9, [x0, #24] = // x9 =3D rack->tp; > 0xffff000001388a7c <+16>: ldr w8, [x0, #188] = // w8 =3D rack->r_ctl.cwnd_to_use > 0xffff000001388a80 <+20>: adrp x12, 0xffff0000013ac000 > 0xffff000001388a84 <+24>: ldr w10, [x9, #52] = // w10 =3D tp->snd_wnd; > 0xffff000001388a88 <+28>: ldr x11, [x18] > 0xffff000001388a8c <+32>: ldr x11, [x11, #1256] > 0xffff000001388a90 <+36>: cmp w8, w10 > 0xffff000001388a94 <+40>: csel w10, w8, w10, cc // cc =3D lo, = ul, last // min(rack->r_ctl.cwnd_to_use, tp->snd_wnd); > =3D> 0xffff000001388a98 <+44>: ldr x11, [x11, #40] > 0xffff000001388a9c <+48>: ldr x12, [x12, #2752] > 0xffff000001388aa0 <+52>: ldr w11, [x11, x12] = // w11 =3D V_tcp_do_autosndbuf ??? > 0xffff000001388aa4 <+56>: cbz w11, 0xffff000001388be0 = > 0xffff000001388aa8 <+60>: ldr x8, [x0, #32] = // x8 =3D rack->rc_inp > 0xffff000001388aac <+64>: ldr x19, [x8, #120] = // x19 =3D so =3D x8->inp_socket > 0xffff000001388ab0 <+68>: ldrb w8, [x19, #817] = // w8 =3D (x19->so_snd.sb_flags << 8) & 0ff > 0xffff000001388ab4 <+72>: tbz w8, #3, 0xffff000001388be0 = so->so_snd.sb_flags & SB_AUTOSIZE =3D=3D 0 > 0xffff000001388ab8 <+76>: ldr w11, [x9, #52] = // w11 =3D tp->snd_wnd > 0xffff000001388abc <+80>: ldr w8, [x19, #740] = // w8 =3D so->so_snd.sb_hiwat > 0xffff000001388ac0 <+84>: lsr w11, w11, #2 > 0xffff000001388ac4 <+88>: add w11, w11, w11, lsl #2 > 0xffff000001388ac8 <+92>: cmp w11, w8 > 0xffff000001388acc <+96>: b.cc = 0xffff000001388be0 // b.lo, b.ul, b.last > 0xffff000001388ad0 <+100>: ldr w11, [x19, #736] > 0xffff000001388ad4 <+104>: lsr w8, w8, #3 > 0xffff000001388ad8 <+108>: lsl w12, w8, #3 > 0xffff000001388adc <+112>: sub w8, w12, w8 > 0xffff000001388ae0 <+116>: cmp w11, w8 > 0xffff000001388ae4 <+120>: b.cc = 0xffff000001388be0 // b.lo, b.ul, b.last > 0xffff000001388ae8 <+124>: ldr x8, [x18] > 0xffff000001388aec <+128>: ldr x8, [x8, #1256] > 0xffff000001388af0 <+132>: ldr x12, [x8, #40] > 0xffff000001388af4 <+136>: adrp x8, 0xffff0000013ac000 > 0xffff000001388af8 <+140>: ldr x8, [x8, #2760] > 0xffff000001388afc <+144>: ldr w12, [x12, x8] > 0xffff000001388b00 <+148>: cmp w11, w12 >=20 > So it seems that the code accessing V_tcp_do_autosndbuf is: >=20 > 0xffff000001388a80 <+20>: adrp x12, 0xffff0000013ac000 > ... > 0xffff000001388a88 <+28>: ldr x11, [x18] > 0xffff000001388a8c <+32>: ldr x11, [x11, #1256] > ... > =3D> 0xffff000001388a98 <+44>: ldr x11, [x11, #40] > 0xffff000001388a9c <+48>: ldr x12, [x12, #2752] > 0xffff000001388aa0 <+52>: ldr w11, [x11, x12] = // w11 =3D V_tcp_do_autosndbuf ??? >=20 > and for V_tcp_autosndbuf_max it is: > 0xffff000001388ae8 <+124>: ldr x8, [x18] > 0xffff000001388aec <+128>: ldr x8, [x8, #1256] > 0xffff000001388af0 <+132>: ldr x12, [x8, #40] > 0xffff000001388af4 <+136>: adrp x8, 0xffff0000013ac000 > 0xffff000001388af8 <+140>: ldr x8, [x8, #2760] > 0xffff000001388afc <+144>: ldr w12, [x12, x8] >=20 > The #2752 versus #2760 could be the offset of the variable. >=20 > Does the above code makes sense to you? The code relevant for the = crash seems to be: >=20 > 0xffff000001388a88 <+28>: ldr x11, [x18] > 0xffff000001388a8c <+32>: ldr x11, [x11, #1256] > 0xffff000001388a98 <+44>: ldr x11, [x11, #40] >=20 > Since it is crashing at 0xffff000001388a98 <+44>, my assumption was = that x18 is wrong... > But does this use fit to your description? This code is loading curthread from the pcpu data, then loading whatever = is 1256 bytes within struct thread. I checked the offset of td_vnet and = found it was at the correct location so it would appear to be using = VIMAGE and has a bad vnet pointer. The other assembly above also looks like it=E2=80=99s using VIMAGE as = they have similar code with the same offsets. >=20 > I'm trying to debug this on arm64, since I can reproduce it on arm64. = But there is > also a bug report that this happens on amd64: = https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D257195 = >=20 > Any idea what can be wrong? Any hint how to progress? If you can reproduce of amd64 it might pay to test with KASAN. How stable is the bad pointer value? It might pay to add KASSERTS to the = code to check curvnet (the macro to get td_vnet) is not the bad value, = or at least greater than VM_MIN_KERNEL_ADDRESS. Andrew= --Apple-Mail=_176D5592-4DF4-4747-A7A9-AD5ED99BB2E6--