Date: Tue, 13 Feb 2007 21:02:23 +0200 From: Kostik Belousov <kostikbel@gmail.com> To: Kris Kennaway <kris@obsecurity.org> Cc: amd64@freebsd.org, current@freebsd.org Subject: Re: Page fault in amd64 pmap_qremove from vm_thread_new() Message-ID: <20070213190222.GE25802@deviant.kiev.zoral.com.ua> In-Reply-To: <20070213185312.GF67616@xor.obsecurity.org> References: <20070213185312.GF67616@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--9l24NVCWtSuIVIod Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 13, 2007 at 01:53:12PM -0500, Kris Kennaway wrote: > I get this frequently when running stress2 on an 8-core amd64 system: >=20 > Fatal trap 12: page fault while in kernel mode > Fatal trap 12: page fault while in kernel mode >=20 >=20 > cpuid =3D 2; >=20 >=20 > apic id =3D 02 >=20 > Fatal trap 12: page fault while in kernel mode >=20 > cpuid =3D 5; fault virtual address =3D 0xffff807ffffff040 > Fatal trap 12: page fault while in kernel mode > Fatal trap 12: page fault while in kernel mode >=20 > cpuid =3D 4; apic id =3D 05 > apic id =3D 04 > fault virtual address =3D 0xffff807ffffff0e0 > fault virtual address =3D 0xffff807ffffff0b8 > cpuid =3D 0; fault code =3D supervisor write data, page not pre= sent >=20 > instruction pointer =3D 0x8:0xffffffff803deedd > cpuid =3D 3; stack pointer =3D 0x10:0xffffffffc7647720 > fault code =3D supervisor write data, page not present >=20 > instruction pointer =3D 0x8:0xffffffff803deedd > apic id =3D 00 > stack pointer =3D 0x10:0xffffffffcfd7e720 > fault code =3D supervisor write data, page not present > frame pointer =3D 0x10:0xffffffffc7647730 > frame pointer =3D 0x10:0xffffffffcfd7e730 > Fatal trap 12: page fault while in kernel mode >=20 > cpuid =3D 6; > instruction pointer =3D 0x8:0xffffffff803deedd >=20 > stack pointer =3D 0x10:0xffffffffb2b93720 >=20 > frame pointer =3D 0x10:0xffffffffb2b93730 >=20 > code segment =3D base 0x0, limit 0xfffff, type 0x1b >=20 > =3D DPL 0, pres 1, long 1, def32 0, gran 1 >=20 > processor eflags =3D > interrupt enabled, > resume, Fatal trap 12: page fault while in kernel mode > apic id =3D 06 > cpuid =3D 7; fault virtual address =3D 0xffff807ffffff108 > apic id =3D 07 > fault code =3D supervisor write data, page not present > code segment =3D base 0x0, limit 0xfffff, type 0x1b > apic id =3D 03 > =3D DPL 0, pres 1, long 1, def32 0, gran 1 > fault virtual address =3D 0xffff807ffffff068 > IOPL =3D 0 > fault code =3D supervisor write data, page not present > fault virtual address =3D 0xffff807ffffff018 > instruction pointer =3D 0x8:0xffffffff803deedd > instruction pointer =3D 0x8:0xffffffff803deedd > Fatal trap 12: page fault while in kernel mode > stack pointer =3D 0x10:0xffffffffbf901720 > cpuid =3D 4; stack pointer =3D 0x10:0xffffffffb1c11720 > processor eflags =3D frame pointer =3D 0x10:0xffffffffb1c1= 1730 > interrupt enabled, resume, fault code =3D supervisor write data= , page not present > IOPL =3D 0 > instruction pointer =3D 0x8:0xffffffff803deedd > current process =3D stack pointer =3D 0x10:0xffffffffd5b2= 5720 > frame pointer =3D 0x10:0xffffffffbf901730 > frame pointer =3D 0x10:0xffffffffd5b25730 > code segment =3D base 0x0, limit 0xfffff, type 0x1b > current process =3D =3D DPL 0, pres 1, long= 1, def32 0, gran 1 > code segment =3D base 0x0, limit 0xfffff, type 0x1b > code segment =3D base 0x0, limit 0xfffff, type 0x1b > 18747 (thr2) > [thread pid 18747 tid 142909 ] > Stopped at pmap_qremove+0x2d: movq $0,(%rcx,%rax,8) > db> wh > Tracing pid 18747 tid 142909 td 0xffffff0095710cd0 > pmap_qremove() at pmap_qremove+0x2d > vm_thread_new() at vm_thread_new+0x8d > thread_init() at thread_init+0x16 > slab_zalloc() at slab_zalloc+0x282 > uma_zone_slab() at uma_zone_slab+0x1ae > uma_zalloc_bucket() at uma_zalloc_bucket+0x19d > uma_zalloc_arg() at uma_zalloc_arg+0x3a3 > thread_alloc() at thread_alloc+0x1f > create_thread() at create_thread+0xc5 > kern_thr_new() at kern_thr_new+0x75 > thr_new() at thr_new+0x62 > syscall() at syscall+0x310 > Xfast_syscall() at Xfast_syscall+0xab > --- syscall (455, FreeBSD ELF64, thr_new), rip =3D 0x8007a1cac, rsp =3D 0= x7fffffffdef8, rbp =3D 0 --- > db> show allpcpu > Current CPU: 2 >=20 > cpuid =3D 0 > curthread =3D 0xffffff00717e8290: pid 18944 "thr2" > curpcb =3D 0xffffffffe2e33d50 > fpcurthread =3D none > idlethread =3D 0xffffff00b9aa6520: pid 17 "idle: cpu0" > spin locks held: >=20 > cpuid =3D 1 > curthread =3D 0xffffff0015e9d7b0: pid 18736 "thr2" > curpcb =3D 0xffffffffbceefd50 > fpcurthread =3D none > idlethread =3D 0xffffff00b9aa6290: pid 16 "idle: cpu1" > spin locks held: > exclusive spin mutex sio r =3D 0 (0xffffffff806bf3c0) locked @ dev/sio/si= o.c:1390 >=20 > cpuid =3D 2 > curthread =3D 0xffffff0095710cd0: pid 18747 "thr2" > curpcb =3D 0xffffffffcfd7ed50 > fpcurthread =3D none > idlethread =3D 0xffffff00b9aa6000: pid 15 "idle: cpu2" > spin locks held: >=20 > cpuid =3D 3 > curthread =3D 0xffffff00ad485290: pid 18743 "thr2" > curpcb =3D 0xffffffffd5b25d50 > fpcurthread =3D none > idlethread =3D 0xffffff00b9a63cd0: pid 14 "idle: cpu3" > spin locks held: >=20 > cpuid =3D 4 > curthread =3D 0xffffff0098fc7000: pid 18942 "thr2" > curpcb =3D 0xffffffffc77fad50 > fpcurthread =3D none > idlethread =3D 0xffffff00b9a63000: pid 13 "idle: cpu4" > spin locks held: > exclusive spin mutex turnstile chain r =3D 0 (0xffffffff80613ed8) locked = @ kern/subr_turnstile.c:489 >=20 > cpuid =3D 5 > curthread =3D 0xffffff00215b8cd0: pid 18708 "thr2" > curpcb =3D 0xffffffffb2b93d50 > fpcurthread =3D none > idlethread =3D 0xffffff00b9a8fcd0: pid 12 "idle: cpu5" > spin locks held: >=20 > cpuid =3D 6 > curthread =3D 0xffffff005b72d520: pid 18718 "thr2" > curpcb =3D 0xffffffffb1c11d50 > fpcurthread =3D none > idlethread =3D 0xffffff00b9a8fa40: pid 11 "idle: cpu6" > spin locks held: >=20 > cpuid =3D 7 > curthread =3D 0xffffff0078aae7b0: pid 18782 "thr2" > curpcb =3D 0xffffffffbf901d50 > fpcurthread =3D none > idlethread =3D 0xffffff00b9a8f7b0: pid 10 "idle: cpu7" > spin locks held: >=20 > For some reason ddb doesn't give sensible backtraces for the running thre= ads: >=20 > db> wh 18944 > Tracing pid 18944 tid 130433 td 0xffffff009daa7290 > fork_trampoline() at fork_trampoline > db> wh 18736 > Tracing pid 18736 tid 165977 td 0xffffff00632b2cd0 > fork_trampoline() at fork_trampoline > db> wh 18747 > Tracing pid 18747 tid 165890 td 0xffffff0037403000 > fork_trampoline() at fork_trampoline > db> wh 18743 > Tracing pid 18743 tid 165929 td 0xffffff004f59e000 > fork_trampoline() at fork_trampoline > db> wh 18942 > Tracing pid 18942 tid 130531 td 0xffffff000a166520 > fork_trampoline() at fork_trampoline > db> wh 18708 > Tracing pid 18708 tid 166269 td 0xffffff005c28a290 > fork_trampoline() at fork_trampoline > db> wh 18718 > Tracing pid 18718 tid 111088 td 0xffffff0081f51a40 > fork_trampoline() at fork_trampoline > db> wh 18782 > Tracing pid 18782 tid 166078 td 0xffffff0052b4c000 > fork_trampoline() at fork_trampoline Is the backtrace for faulted thread always the same ? And this is CURRENT ? I'm starring at similar (looks random) corruption on amd64 6.2-RELEASE. Machine already produced >2 core dumps. --9l24NVCWtSuIVIod Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFF0gs+C3+MBN1Mb4gRAgWXAKDvh0JupoLv9cK9sfHgcf57cpNaQgCeIX5x P0K/un9sWC1Qh/3e8WVsTc4= =5WaY -----END PGP SIGNATURE----- --9l24NVCWtSuIVIod--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070213190222.GE25802>