Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 02 Jun 2014 08:49:07 -0700
From:      Mike Carlson <mike@bayphoto.com>
To:        Steven Hartland <killing@multiplay.co.uk>, freebsd-fs@freebsd.org
Subject:   Re: ZFS Kernel Panic on 10.0-RELEASE
Message-ID:  <538C9CF3.6070208@bayphoto.com>
In-Reply-To: <782C34792E95484DBA631A96FE3BEF20@multiplay.co.uk>
References:  <5388D64D.4030400@bayphoto.com> <EC2EA442-56FC-46B4-A1E2-97523029B7B3@mail.turbofuzz.com> <5388E5B4.3030002@bayphoto.com> <538BBEB7.4070008@bayphoto.com> <782C34792E95484DBA631A96FE3BEF20@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
On 6/2/2014 2:12 AM, Steven Hartland wrote:
> ----- Original Message ----- From: "Mike Carlson" <mike@bayphoto.com>
>
>> On 5/30/2014 1:10 PM, Mike Carlson wrote:
>> > On 5/30/2014 12:48 PM, Jordan Hubbard wrote:
>> >> On May 30, 2014, at 12:04 PM, Mike Carlson <mike@bayphoto.com> wrote:
>> >>
>> >>> Over the weekend, we had upgraded one of our servers from 
>> 9.1-RELEASE to 10.0-RELEASE, and then the zpool was upgraded (from 
>> >>> 28 to 5000)
>> >>>
>> >>> Tuesday afternoon, the server suddenly rebooted (kernel panic), 
>> and as soon as it tried to remount all of its ZFS volumes, >>> it 
>> panic'd again.
>> >> Whats the panic text?  Thats pretty crucial in figuring out 
>> whether this is recoverable (e.g. if its spacemap corruption >> 
>> related, probably not).
>> >>
>> >> - Jordan
>> >>
>> >>
>> >>
>> > I had linked the pictures I took of the console, but here is my 
>> manual reproduction:
>> >
>> >    Fatal trap 12: page fault while in kernel mode
>> >    cpuid = 7; apic id = 07
>> >    fault virtual address    = 0x4a0
>> >    fault code               = supervisor read data, page not present
>> >    instruction pointer      = 0x20:0xffffffff81a7f39f
>> >    stack pointer            = 0x28:0xfffffe1834789570
>> >    frame pointer            = 0x28:0xfffffe18347895b0
>> >    code segment             = base 0x0, limit 0xfffff, type 0x1b
>> >                              = DPL 0, pres 1, long 1, def32 0, gran 1
>> >    processor eflags         = interrupt enabled, resume, IOPL = 0
>> >    current process          = 1849 (txg_thread_enter)
>> >    trap number              = 12
>> >    panic: page fault
>> >    cpuid = 7
>> >    KDB: stack backtrace:
>> >    #0 0xffffffff808e7dd0 at kdb_backtrace+0x60
>> >    #1 0xffffffff808af8b5 at panic+0x155
>> >    #2 0xffffffff80c8e629 at trap_fatal+0x3a2
>> >    #3 0xffffffff80c8e969 at trap_pfault+0x2c9
>> >    #4 0xffffffff80c8e0f6 at trap+0x5e6
>> >    #5 0xffffffff80c75392 at calltrap+0x8
>> >    #6 0xffffffff81a53b5a at dsl_dataset_block_kill+0x3a
>> >    #7 0xffffffff81a50967 at dnode_sync+0x237
>> >    #8 0xffffffff81a48fcb at dmu_objset_sync_dnodes+0x2b
>> >    #9 0xffffffff81a48e4d at dmo_objset_sync+0x1ed
>> >    #10 0xffffffff81a5d29a at dsl_pool_sync+0xca
>> >    #11 0xffffffff81a78a4e at spa_sync+0x52e
>> >    #12 0xffffffff81a81925 at txg_sync_thread+0x375
>> >    #13 0xffffffff8088198a at fork_exit+0x9a
>> >    #14 0xffffffff80c758ce at fork_trampoline+0xe
>> >    uptime: 46s
>> >    Automatic reboot in 15 seconds - press a key on the console to 
>> abort
>> >
>> This just happened again to another server. We upgraded two servers 
>> on the same morning, and now both of them exhibit this corrupted zfs 
>> volume and panic behavior.
>>
>> Out of all the volumes, one of them is causing the panic, and the 
>> panic message is nearly identical.
>>
>> I have 4 snapshots over the last 24 hours, so hopefully a snapshot 
>> from noon today can be sent to a new volume ( zfs send | zfs recv )
>>
>> I guess I can now rule out it being a hardware issue, this is clearly 
>> problem related to the upgrade (freebsd-update  was used). I first 
>> thought the first system had a bad upgrade, perhaps a mix and match 
>> of 9.2 binaries running on a 10 kernel, but I used the 
>> 'freebsd-update IDS' command to verify the integrity of the install, 
>> and it looked good, the only differences were config files in /etc/ 
>> that we manage.
>>
>
> Do you have a kernel crash dump from this?
>
> Also can you confirm if your amd64 or just i386?
>
>    Regards
>    Steve
>
>

I dont have a crash dump, and this is on amd64

I might be able to get a crash dump on one of them, the other is back up 
and running. It is a little challenging because the system I can do this 
on has zfs on root, but I have a spare drive I can use as the swap volume.

Mike C


[-- Attachment #2 --]
0	*H
010	+0	*H
"00e3v=0
	*H
0K10
URootCA10U

Bay Photo Lab10U
California10	UUS0
121023173218Z
271023173218Z0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUS0"0
	*H
0
;TąuyK~Zz2M'4
EiTj)yL5"kv7Urn \!SgP;zh>ˊj\VovX<LgfxxkL1CdY\S;z(5TO[)5bu\mBj*
nUh&`Qί;ZxȜF
ԧ@})8}4#dzw&P^=AdT}*4 qS^E)̈cA$XDS]Z/_5M`~ӻRo'Ftw\e.G.3@m"\,c{'Gidv(TQY9zbpˆ9c#Y³Vs|If	ew7I%Grau07hf.;{Jʾx/R1.LT}Կ!kb
o8H	]=}SΈ퉃00U-rfbb,v0U00U#0FNqi$x'{(W+0U00}{http://bayca.bayhoto.local/ejbca/publicweb/webdist/certdist?cmd=crl&issuer=CN=RootCA,O=Bay%20Photo%20Lab,ST=California,C=US0U0
	*H
JMUZ>7gm[z }/.~^J;өƉ-Q_\Όh޲#ԾXL7ph(@`+8W&ib!Qj+ȡ1iT(#^( giZ9c<R꼓e.ݘVѬ峿ۅ8Dh$~mm啠~'\ET&	 a}rMKL0u%HYL
l=`Υ3k[؝Y}$ ss8?~IXKda<==mL[RҠsHBR/*`JfUzA)'0JkArvp#e-{]U
Z`#2Ϡv~.#l7"D=&t^-Q_9Mi
uԒn{Zn!U%r3J;Q׼Di@PNg]&;yw|9B*.L=Ij-)/]'g^U0#0b=0
	*H
0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUS0
121023180003Z
141023180003Z0`10
	&,d1306910UMike Carlson10	UIT10U

Bay Photo Lab10	UUS0"0
	*H
0
<ȼ^|=e9KtФ-jI_	%[߲'O%3;=*n((RT	•͐C/\WU@HCjrIU-iE˼|paҨm-4݈amƵbK$"UEkEzd
w.
wG u:B'9!?tdk%%̞N.8C1ަί[
BjF0){C9&pXnĉZuX")3zsS\\D:L׏1Q}1Gzz(d#V3fRoш^CLfQ@S/StX
d5Y3M0ՙQ5ō;pIdV]&d#26zsgM}r#iМ|3)md:}뚁00R+F0D0B+06http://bayca.bayhoto.local/ejbca/publicweb/status/ocsp0U(}awJ״(#0U00U#0-rfbb,v0U00U%0
+0U0mike@bayphoto.com0
	*H
9|&V,*Hd	ƏA~6fFg'^y
I'yy,v}Z	@ᔘ7\F5QA37*LT4VStTe .Dӧ=n}=L\E	{
z7kYs#RO}E`OnL'1M0`Dۋ
rvVuX?s=	+O0:yE?BA̡5|Ʀpp*<FLA36k덝j9b=&)KJSmʐXo@g;V4@ujkX9	@Wh#nl\Y)A
rFGjqtvhu.ճK)L}@41AKz&ȴztÈ6͢j=0*+@;xnc-
WƣLG9X)=
y%]Q@BW
,Άut00MiC0
	*H
0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUS0
121023175745Z
141023175745Z0`10
	&,d1306910UMike Carlson10	UIT10U

Bay Photo Lab10	UUS0"0
	*H
0
@vɌAļVAW5:eh$n>b%k7Pwޡ=^CBv2ULLqn6+>A:P#=ѕ[8Z<|&wb(x椉
iҒx9H?~Ɔ-y]jN崡1geAˇwH4w?h!/^Pؕfa5-+%<*/+`ZBCƀno|6'zoe!)@H藱$zѩ+
SXDz(~Bݬe?V
\j;.P,銉[JݦkjY*nȡ5]
hlkz3.Wme/tɧ#	8L%
Ũ%zp	_p)ڜ(C=MYe3S>Tfρ=@	]ڑav&0ۗ;.j'Yk_00R+F0D0B+06http://bayca.bayhoto.local/ejbca/publicweb/status/ocsp0UFO+Rdb`?60U00U#0-rfbb,v0U00http://bayca.bayhoto.local/ejbca/publicweb/webdist/certdist?cmd=crl&issuer=CN=Bay%20Photo%20People%20CA,O=Bay%20Photo%20Lab,ST=California,C=US0U0U%0++0U0mike@bayphoto.com0
	*H
/ungfsy@KLw.cM&6?-Y4++IJYD	C£S_2$eڏPU((̖S~aM0ri~jk2Ւ[n9rn&Bz(MݼIܪ*ȱImu5lr[Q`3͈;l{Z07h$>at)qo\]pJW7*[c%

y1FB)p2͞[~=?!Wd9XY5.bOKUDV[Z98E
^X9n<Hi@C?H+jlۗc݌&yqQ<Ii/
ɣ*B!f<.Re-=Y*?-4;|vj1@+Iܑ=J7%'jMmrSM@GV|:C'ݮ_Lkt61F0B0d0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUSMiC0	+0	*H
	1	*H
0	*H
	1
140602154907Z0#	*H
	14FD+(L5r"}0l	*H
	1_0]0	`He*0	`He0
*H
0*H
0
*H
@0+0
*H
(0s	+71f0d0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUSb=0u*H
	1fd0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUSb=0
	*H
WEq@DK,pvv3ʾ~瀿|zR}U"$e\*-7J'2s3YBS]řӛtݘi&ԞԲL#Ȩ~~ȲzĆm#Xm
l|aM
%TÊ]`\=PyC&|tuX([tȳC	1QH/)j3bLuL4L/8P
N_)ս
D09a#bt8h^"
6t&մ*n('ͩ$=.AUӗמ	dsx*Ʒvͮ0P@n03O1P9zDq=
jC)Q/!7㵥z9}1w	(9XmpH&A+
w;F!\Ws=8k`JSd[K98ec^}+ɬ9)(K:{(.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?538C9CF3.6070208>