From owner-freebsd-bugs@FreeBSD.ORG Tue Mar 22 11:23:14 2011 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 64DF3106564A for ; Tue, 22 Mar 2011 11:23:14 +0000 (UTC) (envelope-from canevet@embl.fr) Received: from emblmta1.embl.fr (emblmta1.embl.fr [193.49.43.176]) by mx1.freebsd.org (Postfix) with ESMTP id EDD038FC19 for ; Tue, 22 Mar 2011 11:23:13 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.63,224,1299452400"; d="asc'?scan'208";a="1507926" Received: from unknown (HELO [172.26.15.11]) ([172.26.15.11]) by emblmta1.embl.fr with ESMTP/TLS/DHE-RSA-CAMELLIA256-SHA; 22 Mar 2011 11:53:15 +0100 From: =?ISO-8859-1?Q?Micka=EBl_Can=E9vet?= To: freebsd-bugs@freebsd.org Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-NrtsD7Bt/Ww1V0avpZav" Date: Tue, 22 Mar 2011 11:53:14 +0100 Message-ID: <1300791194.2566.37.camel@pc286.embl.fr> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Subject: "Fatal double fault" panic X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2011 11:23:14 -0000 --=-NrtsD7Bt/Ww1V0avpZav Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, I have a redundant NAS made of FreeBSD + HAST + ZFS and 24TB of disks. This morning my primary node crashed around 4:20am. On the console I can see: Fatal double fault rip =3D 0xffffffff805e78b8 rsp =3D 0xffffff8485d43fc0 rbp =3D 0xffffff8485d44010 cpuid =3D 1; apic id =3D 12 panic: double fault cpuid =3D 1 KDB: stack backstrace: #0 0xffffffff805f4e0e at kdb_backtrace+0x5e #1 0xffffffff805c2d07 at panic+0x187 #2 0xffffffff808ac366 at dblfault_handler+0x96 #3 0xffffffff808950bd at Xdblfault+0xad Uptime: 4d14h7m5s Cannot sump, Device not defined or unavailable. The only thing I can see on my munin graphs is a strange IO activity (disk and network over my HAST link) that starts at 3am every morning and last about 1 hour and a half (and so until crash this morning). I double checked my scheduled scripts and I do not do anything at that time. So I suspect a system script to be responsible of this activity. I'm not sure that this IO activity results in the crash, but that the only track I have. I don't know exactly on which mailing list to post that issue. I can provide you munin graphs if needed (cpu, network io, disk io, load, memory, netstat, open_files, processes, swap, vmstat, zfs_arc_cache_hits_by_cache, zfs_arc_cache_hits_by_data_type, zfs_arc_efficiency, zfs_arc_utilization, zfs_dmu_prefetch) for both primary and secondary node. Thanks a lot for your help. Micka=C3=ABl --=-NrtsD7Bt/Ww1V0avpZav Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEABECAAYFAk2If5oACgkQZjBmN5Hi/YZvpACeNKwwVEA3Co07q7PD14G0vY7r D7IAn1nGfRyYq0eqTONr2LreRiPouiXK =emnN -----END PGP SIGNATURE----- --=-NrtsD7Bt/Ww1V0avpZav--