Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Nov 2009 02:21:21 +0100
From:      Kai Gallasch <gallasch@free.de>
To:        Andriy Gapon <avg@icyb.net.ua>
Cc:        freebsd-current@freebsd.org
Subject:   Re: 8.0RC2 amd64 - kernel panic running make buildworld
Message-ID:  <20091114022121.217dd831@orwell.free.de>
In-Reply-To: <4AFD655E.5020801@icyb.net.ua>
References:  <1031257439203@webmail57.yandex.ru> <hdc73v$4rt$1@ger.gmane.org> <941257966918@webmail42.yandex.ru> <200911111504.14906.jhb@freebsd.org> <20091112195932.5875387e@orwell.free.de> <4AFD140D.7010407@icyb.net.ua> <20091113144804.2c0fb90f@orwell.free.de> <4AFD655E.5020801@icyb.net.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Am Fri, 13 Nov 2009 15:55:42 +0200
schrieb Andriy Gapon <avg@icyb.net.ua>:

> on 13/11/2009 15:48 Kai Gallasch said the following:
> > Am Fri, 13 Nov 2009 10:08:45 +0200
> > schrieb Andriy Gapon <avg@icyb.net.ua>:
> >> Kai,
> >> I have a hunch, could you please try the following _sledgehammer_
> >> patch (only kernel build/install is needed):
> >> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
> >> index 44b71f3..a456609 100644
> >> --- a/sys/amd64/amd64/pmap.c
> >> +++ b/sys/amd64/amd64/pmap.c
> >> @@ -2981,6 +2981,7 @@ setpte:
> >>  	 * Map the superpage.
> >>  	 */
> >>  	pde_store(pde, PG_PS | newpde);
> >> +	pmap_invalidate_all(pmap);
> >>
> >>  	pmap_pde_promotions++;
> >>  	CTR2(KTR_PMAP, "pmap_promote_pde: success for va %#lx"
> >>
> >> This will slow down an act of promotion to a superpage, but should
> >> not have any visible impact on overall performance.
> > 
> > Andriy,
> > 
> > I tried the patch with c
> > hw.mca.enabled="1" , rebuilt the kernel (although normally I never
> > build kernels on Friday 13th :-) and ran buildworld -j8 for five
> > times in a row. No sign of a machine check exception, no reboot.
> 
> I think that this is good news.
> This is not a fix, but the fact that it helps should help us find a
> proper solution.

Hi. The patch did help for surviving a makeworld.

But now I have another machine check exception with this server. It
happened with your patch active, and vm.pmap.pg_ps_enabled="1". I
copied data from a remote server by NFS mount to the instable server.
Destination was a local ZFS filesystem.

----------------

sonnenkraft:~ # MCA: CPU 7 UNCOR PCC OVER DTLB L1 error
MCA: Address 0xff800d860000


Fatal trap 28: machine check trap while in kernel mode
cpuid = 7; apic id = 07
instruction pointer	= 0x20:0xffffffff80e5f0b2
stack pointer	        = 0x28:0xffffff8241f8d7d0
frame pointer	        = 0x28:0xffffff8241f8da40
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, IOPL = 0
current process		= 0 (spa_zio_1)
[thread pid 0 tid 100193 ]
Stopped at      lzjb_compress+0x162:    leal    0x1(%rdx),%edi
db> bt
Tracing pid 0 tid 100193 td 0xffffff000732aab0
lzjb_compress() at lzjb_compress+0x162
zio_compress_data() at zio_compress_data+0xbe
zio_write_bp_init() at zio_write_bp_init+0xc2
zio_execute() at zio_execute+0x77
zio_ready() at zio_ready+0x124
zio_execute() at zio_execute+0x77
taskq_run() at taskq_run+0x13
taskqueue_run() at taskqueue_run+0x91
taskqueue_thread_loop() at taskqueue_thread_loop+0x3f
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff8241f8dd30, rbp = 0 ---

----------------

After this I again tried copying to local zfs through nfs - and
again an exception.

When setting vm.pmap.pg_ps_enabled="0" in loader.conf and rebooting the
server survives the nfs copying and stays stable.

--Kai.


 





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20091114022121.217dd831>