Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Feb 2017 15:10:44 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        Justin Hibbits <chmeeedalf@gmail.com>, mjg@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, svn-src-head@freebsd.org, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
Message-ID:  <CD8044D2-C11A-4ABE-B72D-62BDDE302C7C@dsl-only.net>
In-Reply-To: <83428304-87BE-413C-BAB9-8FF218E7661C@dsl-only.net>
References:  <2FD12B8F-2255-470A-98D4-2DCE9C7495F5@dsl-only.net> <20170220191044.GA8526@dft-labs.eu> <83428304-87BE-413C-BAB9-8FF218E7661C@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2017-Feb-20, at 2:58 PM, Mark Millard <markmi@dsl-only.net> wrote:

> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik <mjguzik at gmail.com> =
wrote:
>=20
>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
>>> [Note: I experiment with clang based powerpc64 builds,
>>> reporting problems that I find. Justin is familiar
>>> with this, as is Nathan.]
>>>=20
>>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>>> that I have access to from head -r312761 to -r313864 and
>>> ended up with random panics and hang ups in fairly short
>>> order after booting.
>>>=20
>>> Some approximate bisecting for the kernel lead to:
>>> (sometimes getting part way into a buildkernel attempt
>>> for a different version before a failure happens)
>>>=20
>>> -r313266: works (just before use of atomic_fcmpset)
>>> vs.
>>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>>>=20
>>> (I did not try -r313268 through -r313270 as the use was
>>> gradually added.)
>>>=20
>>> So I'm currently running a -r313864 world with a -r313266
>>> kernel.
>>>=20
>>> No kernel that I tried that was from before -r313266 had the
>>> problems.
>>>=20
>>> Any kernel that I tried that was from after -r313271 had the
>>> problems.
>>>=20
>>> Of course I did not try them all in other direction. :)
>>>=20
>>=20
>> I found that spin mutexes were not properly handling this, fixed in
>> r313996.
>>=20
>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
>> fcmpset to simulate failures. Everything works, while it would easily
>> fail without the patch.
>>=20
>> That said, I hope this concludes the 'missing check for not-reread =
value
>> of failed fcmpset' saga.
>>=20
>> --=20
>> Mateusz Guzik <mjguzik gmail.com>
>=20
> I tried to update from -r313864 to -r313999 in my amd64 context
> (a VirtualBox machine under macOS) but it now crashes late in
> the boot sequence (after it processes a dump if I make one but
> before I can log in).
>=20
> This update was via my usual explicit svnlite update; buildworld
> buildkernel; etc. production style build of world and kernel,
> including use of MALLOC_PRODUCTION.
>=20
> The window shows:
>=20
> _vm_map_lock+0xf
> vm_map_wire+0x32
> rtROMemObjNativeLockInMap+0x8c
> rtROMemObjNativeLockUser+0x51
> RTR0MemObjLockUserTag+0x231
> vbglR0HGCMInternalPreprocessCall+0x65d
> vbglR0HGCMInternalCall+0x17c
> vgdrvIoCtl_HGCMCall+0x43f
> VGDrvCommonIoCtl+0x261
> vgdrvFreeBSDIOCtl+0x2cd
> devfs_ioctl+0xae
> VOP_IOCTL_APV+0x88
> vn_ioctl+0x161
> devfs_ioctl_f+0x1f
> kern_ioctl+0x280
> sys_ioctl+0x13f
> amd64_syscall+0x397
> Xfast_syscall+0xfb

More detail from booting with the -r313864 kernel.old
and using kgdb on what the dump produced:

# kgdb kernel.debug /var/crash/vmcore.
/var/crash/vmcore.0    /var/crash/vmcore.last
# kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you =
are
welcome to change it and/or distribute copies of it under certain =
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for =
details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
<118>Starting vboxservice.
<118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 =
2017 18:37:45) release log
<118>00:00:00.000120 main     Log opened 2017-02-20T22:38:46.348080000Z
<118>00:00:00.000162 main     OS Product: FreeBSD
<118>00:00:00.000171 main     OS Release: 12.0-CURRENT
<118>00:00:00.000180 main     OS Version: FreeBSD 12.0-CURRENT  r313999M
<118>00:00:00.000192 main     Executable: /usr/local/sbin/VBoxService
<118>00:00:00.000194 main     Process ID: 609
<118>00:00:00.000196 main     Package type: BSD_64BITS_GENERIC (OSE)


Fatal trap 12: page fault while in kernel mode
cpuid =3D 2; apic id =3D 02
fault virtual address   =3D 0xd6
fault code              =3D supervisor read data, page not present
instruction pointer     =3D 0x20:0xffffffff80d4ebaf
stack pointer           =3D 0x28:0xfffffe0122e2bef0
frame pointer           =3D 0x28:0xfffffe0122e2bf00
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 609 (VBoxService)

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from =
/usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from =
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/modules/vboxguest.ko...done.
Loaded symbols for /boot/modules/vboxguest.ko
#0  doadump (textdump=3D0) at pcpu.h:232
232             __asm("movq %%gs:%1,%0" : "=3Dr" (td)
(kgdb) bt
#0  doadump (textdump=3D0) at pcpu.h:232
#1  0xffffffff8039dd0b in db_dump (dummy=3D<value optimized out>, =
dummy2=3D<value optimized out>, dummy3=3D<value optimized out>, =
dummy4=3D<value optimized out>) at /usr/src/sys/ddb/db_command.c:546
#2  0xffffffff8039db0f in db_command (cmd_table=3D<value optimized out>) =
at /usr/src/sys/ddb/db_command.c:453
#3  0xffffffff8039d884 in db_command_loop () at =
/usr/src/sys/ddb/db_command.c:506
#4  0xffffffff803a0814 in db_trap (type=3D<value optimized out>, =
code=3D<value optimized out>) at /usr/src/sys/ddb/db_main.c:254
#5  0xffffffff80a9c0c3 in kdb_trap (type=3D<value optimized out>, =
code=3D<value optimized out>, tf=3D<value optimized out>) at =
/usr/src/sys/kern/subr_kdb.c:654
#6  0xffffffff80ed25d2 in trap_fatal (frame=3D0xfffffe0122e2be30, =
eva=3D214) at /usr/src/sys/amd64/amd64/trap.c:796
#7  0xffffffff80ed27dc in trap_pfault (frame=3D0xfffffe0122e2be30, =
usermode=3D0) at /usr/src/sys/amd64/amd64/trap.c:658
#8  0xffffffff80ed1e90 in trap (frame=3D0xfffffe0122e2be30) at =
/usr/src/sys/amd64/amd64/trap.c:421
#9  0xffffffff80eb6be1 in calltrap () at =
/usr/src/sys/amd64/amd64/exception.S:236
#10 0xffffffff80d4ebaf in _vm_map_lock (map=3D0x1, file=3D0x0, line=3D0) =
at /usr/src/sys/vm/vm_map.c:501
#11 0xffffffff80d51ea2 in vm_map_wire (map=3D<value optimized out>, =
start=3D4534272, end=3D4538368, flags=3D1) at =
/usr/src/sys/vm/vm_map.c:2534
#12 0xffffffff8265291c in rtR0MemObjNativeLockInMap () from =
/boot/modules/vboxguest.ko
#13 0xffffffff82652881 in rtR0MemObjNativeLockUser () from =
/boot/modules/vboxguest.ko
#14 0xffffffff8264ec01 in RTR0MemObjLockUserTag () from =
/boot/modules/vboxguest.ko
#15 0xffffffff82624afd in vbglR0HGCMInternalPreprocessCall () from =
/boot/modules/vboxguest.ko
#16 0xffffffff8262411a in VbglR0HGCMInternalCall () from =
/boot/modules/vboxguest.ko
#17 0xffffffff8261ec4f in vgdrvIoCtl_HGCMCall () from =
/boot/modules/vboxguest.ko
#18 0xffffffff8261d221 in VGDrvCommonIoCtl () from =
/boot/modules/vboxguest.ko
#19 0xffffffff8262327d in vgdrvFreeBSDIOCtl () from =
/boot/modules/vboxguest.ko
#20 0xffffffff8092976e in devfs_ioctl (ap=3D<value optimized out>) at =
/usr/src/sys/fs/devfs/devfs_vnops.c:805
#21 0xffffffff8103ef58 in VOP_IOCTL_APV (vop=3D<value optimized out>, =
a=3D<value optimized out>) at vnode_if.c:1067
#22 0xffffffff80b29431 in vn_ioctl (fp=3D0xfffff80006d37730, com=3D<value =
optimized out>, data=3D0xfffffe0122e2c870, =
active_cred=3D0xfffff80006495a00, td=3D<value optimized out>) at =
vnode_if.h:448
#23 0xffffffff80929d5f in devfs_ioctl_f (fp=3D<value optimized out>, =
com=3D<value optimized out>, data=3D<value optimized out>, cred=3D<value =
optimized out>, td=3D0xfffff8001504e000) at =
/usr/src/sys/fs/devfs/devfs_vnops.c:763
#24 0xffffffff80ab8bf0 in kern_ioctl (td=3D<value optimized out>, fd=3D3, =
com=3D<value optimized out>, data=3D0xfffffe0122e2c870 "\031\002R\031P") =
at file.h:322
#25 0xffffffff80ab88bf in sys_ioctl (td=3D<value optimized out>, =
uap=3D0xfffffe0122e2ca30) at /usr/src/sys/kern/sys_generic.c:743
#26 0xffffffff80ed2e27 in amd64_syscall (td=3D0xfffff8001504e000, =
traced=3D0) at subr_syscall.c:135
#27 0xffffffff80eb6ecb in Xfast_syscall () at =
/usr/src/sys/amd64/amd64/exception.S:396
#28 0x0000000800c5317a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal


=3D=3D=3D
Mark Millard
markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CD8044D2-C11A-4ABE-B72D-62BDDE302C7C>