From owner-freebsd-current@FreeBSD.ORG Sat Nov 14 01:21:25 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DDCF21065679 for ; Sat, 14 Nov 2009 01:21:24 +0000 (UTC) (envelope-from gallasch@free.de) Received: from smtp.free.de (smtp.free.de [91.204.6.103]) by mx1.freebsd.org (Postfix) with ESMTP id 5A2108FC1C for ; Sat, 14 Nov 2009 01:21:24 +0000 (UTC) Received: (qmail 35633 invoked from network); 14 Nov 2009 02:21:22 +0100 Received: from smtp.free.de (HELO orwell.free.de) (gallasch@free.de@[91.204.4.103]) (envelope-sender ) by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP for ; 14 Nov 2009 02:21:22 +0100 Date: Sat, 14 Nov 2009 02:21:21 +0100 From: Kai Gallasch To: Andriy Gapon Message-ID: <20091114022121.217dd831@orwell.free.de> In-Reply-To: <4AFD655E.5020801@icyb.net.ua> References: <1031257439203@webmail57.yandex.ru> <941257966918@webmail42.yandex.ru> <200911111504.14906.jhb@freebsd.org> <20091112195932.5875387e@orwell.free.de> <4AFD140D.7010407@icyb.net.ua> <20091113144804.2c0fb90f@orwell.free.de> <4AFD655E.5020801@icyb.net.ua> X-Mailer: Claws Mail 3.7.0 (GTK+ 2.18.2; powerpc-apple-darwin9.7.0) X-Face: 7"x0zA5=*cXGZw-xjU<">'+!3(KXTUXZVLD42KVN{'go[UQr"Mc.e(XW92N8plZ(9x.{x; I<|95e+b&GH-36\15F~L$YD*Y +u}o&KV?6.%"mJIkaY3G>BKNt`1|Y+%K1P4t; 47D65&(Y7h5Ll-[ltkhamx.-; ,jggK'}oMpUgEHFG YQ"9oXKAl>!d,J}T{)@uxvfu?YFWC*\~h+,^f Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org Subject: Re: 8.0RC2 amd64 - kernel panic running make buildworld X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2009 01:21:25 -0000 Am Fri, 13 Nov 2009 15:55:42 +0200 schrieb Andriy Gapon : > on 13/11/2009 15:48 Kai Gallasch said the following: > > Am Fri, 13 Nov 2009 10:08:45 +0200 > > schrieb Andriy Gapon : > >> Kai, > >> I have a hunch, could you please try the following _sledgehammer_ > >> patch (only kernel build/install is needed): > >> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c > >> index 44b71f3..a456609 100644 > >> --- a/sys/amd64/amd64/pmap.c > >> +++ b/sys/amd64/amd64/pmap.c > >> @@ -2981,6 +2981,7 @@ setpte: > >> * Map the superpage. > >> */ > >> pde_store(pde, PG_PS | newpde); > >> + pmap_invalidate_all(pmap); > >> > >> pmap_pde_promotions++; > >> CTR2(KTR_PMAP, "pmap_promote_pde: success for va %#lx" > >> > >> This will slow down an act of promotion to a superpage, but should > >> not have any visible impact on overall performance. > > > > Andriy, > > > > I tried the patch with c > > hw.mca.enabled="1" , rebuilt the kernel (although normally I never > > build kernels on Friday 13th :-) and ran buildworld -j8 for five > > times in a row. No sign of a machine check exception, no reboot. > > I think that this is good news. > This is not a fix, but the fact that it helps should help us find a > proper solution. Hi. The patch did help for surviving a makeworld. But now I have another machine check exception with this server. It happened with your patch active, and vm.pmap.pg_ps_enabled="1". I copied data from a remote server by NFS mount to the instable server. Destination was a local ZFS filesystem. ---------------- sonnenkraft:~ # MCA: CPU 7 UNCOR PCC OVER DTLB L1 error MCA: Address 0xff800d860000 Fatal trap 28: machine check trap while in kernel mode cpuid = 7; apic id = 07 instruction pointer = 0x20:0xffffffff80e5f0b2 stack pointer = 0x28:0xffffff8241f8d7d0 frame pointer = 0x28:0xffffff8241f8da40 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 0 (spa_zio_1) [thread pid 0 tid 100193 ] Stopped at lzjb_compress+0x162: leal 0x1(%rdx),%edi db> bt Tracing pid 0 tid 100193 td 0xffffff000732aab0 lzjb_compress() at lzjb_compress+0x162 zio_compress_data() at zio_compress_data+0xbe zio_write_bp_init() at zio_write_bp_init+0xc2 zio_execute() at zio_execute+0x77 zio_ready() at zio_ready+0x124 zio_execute() at zio_execute+0x77 taskq_run() at taskq_run+0x13 taskqueue_run() at taskqueue_run+0x91 taskqueue_thread_loop() at taskqueue_thread_loop+0x3f fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8241f8dd30, rbp = 0 --- ---------------- After this I again tried copying to local zfs through nfs - and again an exception. When setting vm.pmap.pg_ps_enabled="0" in loader.conf and rebooting the server survives the nfs copying and stays stable. --Kai.