Date: Fri, 16 Jul 2021 17:53:14 +0200 From: Andrew Turner <andrew@fubar.geek.nz> To: Michael Tuexen <tuexen@freebsd.org> Cc: Mark Millard <marklmi@yahoo.com>, freebsd-arm@freebsd.org Subject: Re: register x18 Message-ID: <D18F32F8-9BFD-4192-BC9E-59ABAC98EB88@fubar.geek.nz> In-Reply-To: <4361A215-BB47-4166-BC3F-386E7834B788@freebsd.org> References: <86EC9C12-F90C-4D0C-BFA3-41986C9F07B5@freebsd.org> <BFF3BCE7-3387-4A7C-A71C-890223CDDF18@yahoo.com> <32C24DDC-C8A1-43CD-9220-8009B229E452@freebsd.org> <ACD1D84A-5923-4106-AAE4-35FB7A182B0F@fubar.geek.nz> <4361A215-BB47-4166-BC3F-386E7834B788@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_176D5592-4DF4-4747-A7A9-AD5ED99BB2E6 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 16 Jul 2021, at 17:07, Michael Tuexen <tuexen@freebsd.org> wrote: >=20 >> On 16. Jul 2021, at 14:51, Andrew Turner <andrew@fubar.geek.nz = <mailto:andrew@fubar.geek.nz>> wrote: >>=20 >>=20 >>> On 16 Jul 2021, at 13:08, tuexen@freebsd.org = <mailto:tuexen@freebsd.org> wrote: >>>=20 >>>> On 16. Jul 2021, at 04:06, Mark Millard <marklmi@yahoo.com = <mailto:marklmi@yahoo.com>> wrote: >>>>=20 >>>>=20 >>>>=20 >>>> On 2021-Jul-15, at 17:40, Michael Tuexen <tuexen at freebsd.org> = wrote: >>>>=20 >>>>> Dear all, >>>>>=20 >>>>> register x18 seems to be special. What is it used for in FreeBSD? >>>>>=20 >>>>> Best regards >>>>> Michael >>>>=20 >>>> = https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64-bit-A= rchitecture/Register-use-in-the-AArch64-Procedure-Call-Standard/Parameters= -in-general-purpose-registers >>>>=20 >>>> reports: >>>>=20 >>>> QUOTE >>>> =E2=80=A2 X18 is the platform register and is reserved for the = use of platform ABIs. This is an adional temporary register on platforms = that don't assign a special meaning to it. >>>> END QUOTE >>>>=20 >>>> So, special, yes. But I do not know what the "platform ABI" usage >>>> for it might be on FreeBSD. So, for the most part, this does not >>>> well-answer your question. Sorry. >>> Yepp, I found the above text. However, x18 seems to be used when = accessing >>> global variables. I am looking at a panic, where the system panics = on accessing >>> global variable, which can be controlled by sysctl. >>> It seems that x18 does not have the expected value, but it is also = not set in >>> the function... >>=20 >> X18 is used to store the pointer to the pcpu data It should only ever = be set when we enter the kernel from userland by the exception handler. > Hi Andrew, >=20 > thanks for the response. Hmm. I was hoping that the answers helps me = to understand > a panic that I'm observing when stress testing the TCP RACK stack. I'm = transferring > 10GB via scp and at some point of time (not right at the beginning), = the machine panics. > The machine is an eMAG system. >=20 > Here is what I know: >=20 > Initially it panics multiple times (always at the same place) in > https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16540 = <https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16540> > when it is trying to read V_tcp_map_entries_limit. >=20 > I discussed this with rrs@ and since we had no clue, I tried to just = compile > out the if condition. >=20 > Then is paniced in > https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16928 = <https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n16928> > at > https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n15664 = <https://cgit.freebsd.org/src/tree/sys/netinet/tcp_stacks/rack.c#n15664> > which is basically the next place where a V_ variable is accessed. >=20 > Please note that for debugging I'm using a kernel without VIMAGE = support, > since we initially thought that it might be related a VNET bug. >=20 > So I decided to look at the disassembly of rack_sndbuf_autoscale (I = added some comments): >=20 > 0xffff000001388a6c <+0>: stp x29, x30, [sp, #-32]! > 0xffff000001388a70 <+4>: str x19, [sp, #16] > 0xffff000001388a74 <+8>: mov x29, sp > 0xffff000001388a78 <+12>: ldr x9, [x0, #24] = // x9 =3D rack->tp; > 0xffff000001388a7c <+16>: ldr w8, [x0, #188] = // w8 =3D rack->r_ctl.cwnd_to_use > 0xffff000001388a80 <+20>: adrp x12, 0xffff0000013ac000 > 0xffff000001388a84 <+24>: ldr w10, [x9, #52] = // w10 =3D tp->snd_wnd; > 0xffff000001388a88 <+28>: ldr x11, [x18] > 0xffff000001388a8c <+32>: ldr x11, [x11, #1256] > 0xffff000001388a90 <+36>: cmp w8, w10 > 0xffff000001388a94 <+40>: csel w10, w8, w10, cc // cc =3D lo, = ul, last // min(rack->r_ctl.cwnd_to_use, tp->snd_wnd); > =3D> 0xffff000001388a98 <+44>: ldr x11, [x11, #40] > 0xffff000001388a9c <+48>: ldr x12, [x12, #2752] > 0xffff000001388aa0 <+52>: ldr w11, [x11, x12] = // w11 =3D V_tcp_do_autosndbuf ??? > 0xffff000001388aa4 <+56>: cbz w11, 0xffff000001388be0 = <rack_sndbuf_autoscale+372> > 0xffff000001388aa8 <+60>: ldr x8, [x0, #32] = // x8 =3D rack->rc_inp > 0xffff000001388aac <+64>: ldr x19, [x8, #120] = // x19 =3D so =3D x8->inp_socket > 0xffff000001388ab0 <+68>: ldrb w8, [x19, #817] = // w8 =3D (x19->so_snd.sb_flags << 8) & 0ff > 0xffff000001388ab4 <+72>: tbz w8, #3, 0xffff000001388be0 = <rack_sndbuf_autoscale+372> so->so_snd.sb_flags & SB_AUTOSIZE =3D=3D 0 > 0xffff000001388ab8 <+76>: ldr w11, [x9, #52] = // w11 =3D tp->snd_wnd > 0xffff000001388abc <+80>: ldr w8, [x19, #740] = // w8 =3D so->so_snd.sb_hiwat > 0xffff000001388ac0 <+84>: lsr w11, w11, #2 > 0xffff000001388ac4 <+88>: add w11, w11, w11, lsl #2 > 0xffff000001388ac8 <+92>: cmp w11, w8 > 0xffff000001388acc <+96>: b.cc <http://b.cc/> = 0xffff000001388be0 <rack_sndbuf_autoscale+372> // b.lo, b.ul, b.last > 0xffff000001388ad0 <+100>: ldr w11, [x19, #736] > 0xffff000001388ad4 <+104>: lsr w8, w8, #3 > 0xffff000001388ad8 <+108>: lsl w12, w8, #3 > 0xffff000001388adc <+112>: sub w8, w12, w8 > 0xffff000001388ae0 <+116>: cmp w11, w8 > 0xffff000001388ae4 <+120>: b.cc <http://b.cc/> = 0xffff000001388be0 <rack_sndbuf_autoscale+372> // b.lo, b.ul, b.last > 0xffff000001388ae8 <+124>: ldr x8, [x18] > 0xffff000001388aec <+128>: ldr x8, [x8, #1256] > 0xffff000001388af0 <+132>: ldr x12, [x8, #40] > 0xffff000001388af4 <+136>: adrp x8, 0xffff0000013ac000 > 0xffff000001388af8 <+140>: ldr x8, [x8, #2760] > 0xffff000001388afc <+144>: ldr w12, [x12, x8] > 0xffff000001388b00 <+148>: cmp w11, w12 >=20 > So it seems that the code accessing V_tcp_do_autosndbuf is: >=20 > 0xffff000001388a80 <+20>: adrp x12, 0xffff0000013ac000 > ... > 0xffff000001388a88 <+28>: ldr x11, [x18] > 0xffff000001388a8c <+32>: ldr x11, [x11, #1256] > ... > =3D> 0xffff000001388a98 <+44>: ldr x11, [x11, #40] > 0xffff000001388a9c <+48>: ldr x12, [x12, #2752] > 0xffff000001388aa0 <+52>: ldr w11, [x11, x12] = // w11 =3D V_tcp_do_autosndbuf ??? >=20 > and for V_tcp_autosndbuf_max it is: > 0xffff000001388ae8 <+124>: ldr x8, [x18] > 0xffff000001388aec <+128>: ldr x8, [x8, #1256] > 0xffff000001388af0 <+132>: ldr x12, [x8, #40] > 0xffff000001388af4 <+136>: adrp x8, 0xffff0000013ac000 > 0xffff000001388af8 <+140>: ldr x8, [x8, #2760] > 0xffff000001388afc <+144>: ldr w12, [x12, x8] >=20 > The #2752 versus #2760 could be the offset of the variable. >=20 > Does the above code makes sense to you? The code relevant for the = crash seems to be: >=20 > 0xffff000001388a88 <+28>: ldr x11, [x18] > 0xffff000001388a8c <+32>: ldr x11, [x11, #1256] > 0xffff000001388a98 <+44>: ldr x11, [x11, #40] >=20 > Since it is crashing at 0xffff000001388a98 <+44>, my assumption was = that x18 is wrong... > But does this use fit to your description? This code is loading curthread from the pcpu data, then loading whatever = is 1256 bytes within struct thread. I checked the offset of td_vnet and = found it was at the correct location so it would appear to be using = VIMAGE and has a bad vnet pointer. The other assembly above also looks like it=E2=80=99s using VIMAGE as = they have similar code with the same offsets. >=20 > I'm trying to debug this on arm64, since I can reproduce it on arm64. = But there is > also a bug report that this happens on amd64: = https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D257195 = <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D257195> >=20 > Any idea what can be wrong? Any hint how to progress? If you can reproduce of amd64 it might pay to test with KASAN. How stable is the bad pointer value? It might pay to add KASSERTS to the = code to check curvnet (the macro to get td_vnet) is not the bad value, = or at least greater than VM_MIN_KERNEL_ADDRESS. Andrew= --Apple-Mail=_176D5592-4DF4-4747-A7A9-AD5ED99BB2E6--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D18F32F8-9BFD-4192-BC9E-59ABAC98EB88>