Date: Tue, 6 Sep 2022 23:17:45 +0200 From: =?UTF-8?B?VMSzbA==?= Coosemans <tijl@FreeBSD.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: pho@freebsd.org, src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: 95f773e59482 - main - i386 copyout_fast: improve detection of a fault on accessing userspace Message-ID: <20220906231745.1a0f3c15@FreeBSD.org> In-Reply-To: <YxdneZXqyaAPmVL0@kib.kiev.ua> References: <202208241925.27OJP9Fh069091@gitrepo.freebsd.org> <20220906171826.1629cfcf@FreeBSD.org> <YxdneZXqyaAPmVL0@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 6 Sep 2022 18:30:01 +0300 Konstantin Belousov <kostikbel@gmail.com> wrote: > On Tue, Sep 06, 2022 at 05:18:26PM +0200, T=C4=B3l Coosemans wrote: >> On Wed, 24 Aug 2022 19:25:09 GMT Konstantin Belousov <kib@FreeBSD.org> >> wrote: >>> The branch main has been updated by kib: >>>=20 >>> URL: https://cgit.FreeBSD.org/src/commit/?id=3D95f773e59482b1a3462d2fe3= 901532d51fb053b3 >>>=20 >>> commit 95f773e59482b1a3462d2fe3901532d51fb053b3 >>> Author: Konstantin Belousov <kib@FreeBSD.org> >>> AuthorDate: 2022-08-09 00:56:54 +0000 >>> Commit: Konstantin Belousov <kib@FreeBSD.org> >>> CommitDate: 2022-08-24 19:11:40 +0000 >>>=20 >>> i386 copyout_fast: improve detection of a fault on accessing usersp= ace >>> =20 >>> Do not blindly account a page fault occuring on the trampoline area, >>> as the userspace access fault. Check that it occured exactly in the >>> instruction that does that. >>> =20 >>> This avoids unneeded switches of address space on faults not needin= g the >>> switch, effectively converting machine resets due to tripple faults, >>> into regular panics. >>> =20 >>> Reviewed by: jhb >>> Tested by: pho >>> Sponsored by: The FreeBSD Foundation >>> MFC after: 1 week >>> Differential revision: https://reviews.freebsd.org/D36302 >>> --- >>> sys/i386/i386/copyout_fast.s | 16 ++++++++-------- >>> sys/i386/i386/exception.s | 32 ++++++++++++++++++++++++++++---- >>> 2 files changed, 36 insertions(+), 12 deletions(-) >>>=20 >>=20 >> I'm sporadically seeing a panic after this commit. It's caused by a >> page fault during the second rep; movsb in copyin_fast (copying between >> copyout_buf and the kernel). When I add the same special treatment as >> above for this instruction the panic goes away. =20 > The purpose of the patch was to change some tripple faults into normal > panics. The check for %eip belonging to the trampoline area is too > broad to be correct. Yes. > I suspect you see that leftover panics, which I am working on right now. Yes, it looks like this: panic: vm_fault_lookup: fault on nofault entry, addr: 0 GNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD] Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.htm= l> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: panic: vm_fault_lookup: fault on nofault entry, addr: 0 time =3D 1662487582 KDB: stack backtrace: db_trace_self_wrapper(5,ffffffff,0,0,0,...) at db_trace_self_wrapper+0x28/f= rame 0xffc09cf4 kdb_backtrace(1a716ae0,100,e340e0,ffc09db4,ffc09e00,...) at kdb_backtrace+0= x2b/frame 0xffc09d4c vpanic(bef316,ffc09d88,ffc09d88,ffc09dac,b3181a,...) at vpanic+0xe9/frame 0= xffc09d68 panic(bef316,be68d1,0,ffc09dc4,480f308,...) at panic+0x14/frame 0xffc09d7c vm_fault_lookup(0,0,1a710701,0,0,...) at vm_fault_lookup+0x13a/frame 0xffc0= 9dac vm_fault(e340e0,0,1,0,0) at vm_fault+0x79/frame 0xffc09e30 vm_fault_trap(e340e0,3b,1,0,0,0) at vm_fault_trap+0x44/frame 0xffc09e58 trap_pfault(3b,0,0) at trap_pfault+0x119/frame 0xffc09ea0 trap(ffc09f6c,8,28,28,e247000,...) at trap+0x2d4/frame 0xffc09f60 calltrap() at 0xffc031ff/frame 0xffc09f60 --- trap 0xc, eip =3D 0x3b, esp =3D 0xffc09fac, ebp =3D 0xffc033dc --- (null)() at 0x3b/frame 0xffc033dc KDB: enter: panic 0x009a1dfd in dump_savectx () at /home/tijl/freebsd/base/main/sys/kern/kern_shutdown.c:404 (kgdb) #0 0x009a1dfd in dump_savectx () at /home/tijl/freebsd/base/main/sys/kern/kern_shutdown.c:404 Backtrace stopped: Cannot access memory at address 0xffc09af4 >> The panics happened while in X and the crash dump doesn't seem to >> contain the trampoline stack so I ended up rewriting copyin_fast and >> copyout_fast so the copying between the kernel and copyout_buf ran on >> the kernel stack instead of the trampoline stack (see attached patch). >> Now with this patch the panics are also gone so I suspect the problem is >> simply that the trampoline stack is too small to handle some page >> faults. >>=20 >> So, is this patch correct then? I'm not sure it's actually safe to >> handle page faults in this context with interrupts disabled. =20 > I do not think that the patch is correct, and I even surprised that you > do not see a sporadic reboots with it applied (do you?). When you change > address space to do the fast bcopy, kernel stack gets unmapped. So if > the page fault occurs because user page is not resident, fault must cause > tripple fault and reboot the system. >=20 > The idea about too small trampoline stack might have some merits, did > you tried to increase the trampoline stack size? >=20 > I am currently reworking the copyin/copyout_fast to avoid pushing the > bcopy args into trampoline stack, but there is one bug not yet fixed > in the change. See > https://kib.kiev.ua/git/gitweb.cgi?p=3Ddeviant3.git;a=3Dshortlog;h=3Drefs= /heads/ast I'm not seeing reboots. My patch is pretty much identical to yours. I'm running your patch now and everything seems stable. Increasing the trampoline stack size did not fix the problem so it must be something else. What panic are you still seeing?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20220906231745.1a0f3c15>