From owner-freebsd-bugs@FreeBSD.ORG  Tue Mar 22 11:23:14 2011
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
Delivered-To: freebsd-bugs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 64DF3106564A
	for <freebsd-bugs@freebsd.org>; Tue, 22 Mar 2011 11:23:14 +0000 (UTC)
	(envelope-from canevet@embl.fr)
Received: from emblmta1.embl.fr (emblmta1.embl.fr [193.49.43.176])
	by mx1.freebsd.org (Postfix) with ESMTP id EDD038FC19
	for <freebsd-bugs@freebsd.org>; Tue, 22 Mar 2011 11:23:13 +0000 (UTC)
X-IronPort-AV: E=Sophos;i="4.63,224,1299452400"; d="asc'?scan'208";a="1507926"
Received: from unknown (HELO [172.26.15.11]) ([172.26.15.11])
	by emblmta1.embl.fr with ESMTP/TLS/DHE-RSA-CAMELLIA256-SHA;
	22 Mar 2011 11:53:15 +0100
From: =?ISO-8859-1?Q?Micka=EBl_Can=E9vet?= <canevet@embl.fr>
To: freebsd-bugs@freebsd.org
Content-Type: multipart/signed; micalg="pgp-sha1";
	protocol="application/pgp-signature";
	boundary="=-NrtsD7Bt/Ww1V0avpZav"
Date: Tue, 22 Mar 2011 11:53:14 +0100
Message-ID: <1300791194.2566.37.camel@pc286.embl.fr>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.2 
Subject: "Fatal double fault" panic
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Mar 2011 11:23:14 -0000


--=-NrtsD7Bt/Ww1V0avpZav
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi,

I have a redundant NAS made of FreeBSD + HAST + ZFS and 24TB of disks.

This morning my primary node crashed around 4:20am.

On the console I can see:

Fatal double fault
rip =3D 0xffffffff805e78b8
rsp =3D 0xffffff8485d43fc0
rbp =3D 0xffffff8485d44010
cpuid =3D 1; apic id =3D 12
panic: double fault
cpuid =3D 1
KDB: stack backstrace:
#0 0xffffffff805f4e0e at kdb_backtrace+0x5e
#1 0xffffffff805c2d07 at panic+0x187
#2 0xffffffff808ac366 at dblfault_handler+0x96
#3 0xffffffff808950bd at Xdblfault+0xad
Uptime: 4d14h7m5s
Cannot sump, Device not defined or unavailable.

The only thing I can see on my munin graphs is a strange IO activity
(disk and network over my HAST link) that starts at 3am every morning
and last about 1 hour and a half (and so until crash this morning). I
double checked my scheduled scripts and I do not do anything at that
time. So I suspect a system script to be responsible of this activity.
I'm not sure that this IO activity results in the crash, but that the
only track I have.

I don't know exactly on which mailing list to post that issue.

I can provide you munin graphs if needed (cpu, network io, disk io,
load, memory, netstat, open_files, processes, swap, vmstat,
zfs_arc_cache_hits_by_cache, zfs_arc_cache_hits_by_data_type,
zfs_arc_efficiency, zfs_arc_utilization, zfs_dmu_prefetch) for both
primary and secondary node.

Thanks a lot for your help.

Micka=C3=ABl

--=-NrtsD7Bt/Ww1V0avpZav
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEABECAAYFAk2If5oACgkQZjBmN5Hi/YZvpACeNKwwVEA3Co07q7PD14G0vY7r
D7IAn1nGfRyYq0eqTONr2LreRiPouiXK
=emnN
-----END PGP SIGNATURE-----

--=-NrtsD7Bt/Ww1V0avpZav--