Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 04 Jun 2014 13:25:03 -0700
From:      Mike Carlson <mike@bayphoto.com>
To:        Steven Hartland <killing@multiplay.co.uk>, freebsd-fs@freebsd.org
Subject:   Re: ZFS Kernel Panic on 10.0-RELEASE
Message-ID:  <538F809F.5090705@bayphoto.com>
In-Reply-To: <C5AA7E9C30F043E593369059F40EC254@multiplay.co.uk>
References:  <5388D64D.4030400@bayphoto.com> <EC2EA442-56FC-46B4-A1E2-97523029B7B3@mail.turbofuzz.com> <5388E5B4.3030002@bayphoto.com> <538BBEB7.4070008@bayphoto.com> <782C34792E95484DBA631A96FE3BEF20@multiplay.co.uk> <538C9CF3.6070208@bayphoto.com> <16ADD4D9DC73403C9669D8F34FDBD316@multiplay.co.uk> <538CB3EA.9010807@bayphoto.com> <6C6FB182781541CEBF627998B73B1DB4@multiplay.co.uk> <538CC16A.6060207@bayphoto.com> <F959477921CD4552A94BF932A55961F4@multiplay.co.uk> <538CDB7F.2060408@bayphoto.com> <88B3A7562A5F4F9B9EEF0E83BCAD2FB0@multiplay.co.uk> <538CE2B3.8090008@bayphoto.com> <85184EB23AA84607A360E601D03E1741@multiplay.co.uk> <538D0174.6000906@bayphoto.com> <F445995D86AA44FB8497296E0D41AC8F@multiplay.co.uk> <538D18CB.5020906@bayphoto.com> <538D1CD5.5070902@bayphoto.com> <C7A3A403F308403B99A4BD39D48521D8@multiplay.co.uk> <538DF082.3030407@bayphoto.com> <538F699E.4060802@bayphoto.com> <C5AA7E9C30F043E593369059F40EC254@multiplay.co.uk>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
Thanks Steve, I'll write up a summary for the openzfs-developers mailing 
list.

FWIW, the other server that is identical to this working-1 system was 
built with a fresh 10.0-RELEASE install, and we haven't had any issues, 
and its been up and running for months now.

On 6/4/2014 12:23 PM, Steven Hartland wrote:
> You mention mfi and 9.1, which rings alarm bells.
>
> They shouldn't be, but if your drives are > 2^32 sectors you'll
> have corruption:
> http://svnweb.freebsd.org/base?view=revision&amp;revision=242497
>
> In addition to this I did a large number of fixes to mfi after
> this point which could result in all sorts of issues, but that
> doesn't explain issues with mps.
>
> Upgrading shouldn't have removed the cache file so I'm guessing
> that your initial install was already missing this.
>
> zdb is picky about havin a cache file, which is something we
> should fix at some point as IIRC the changes avg or mav made,
> I can't remember which, means that FreeBSD doesn't rely on the cache 
> file being present as much as it did.
>
> Back to the corruption, unfortunately this could be any number
> of things so its almost impossible to tell at which point the
> issue originally occured :(
>
> It might well be worth emailing a summary of the issue to the
> openzfs mailing list see if someone on there has any ideas
> where the DVA corruption could have occured.
>
>    Regards
>    Steve
>
> ----- Original Message ----- From: "Mike Carlson" <mike@bayphoto.com>
> To: <freebsd-fs@freebsd.org>
> Sent: Wednesday, June 04, 2014 7:46 PM
> Subject: Re: ZFS Kernel Panic on 10.0-RELEASE
>
>
> Top-posting... sorry
>
> I'm going to have to roll this particular server back into production, 
> so I'll be rebuilding it from scratch
>
> That is okay with this particular system, the other server that 
> exhibited the same issue will have to have all 19TB of its usable data 
> streamed off to temp storage (if we can get it) and rebuilt as well.
>
> Thank you Steve for being so helpful, and patient with me stumbling 
> through kgdb :)
>
>
> I have some lingering questions about the entire situation:
>
> First, these servers perform regular zpool scrubs (once a month), and 
> have ECC memory. According the the additional logging information I 
> was able to get from Steve's patch, it seems that even with these 
> safeguards data was still corrupted. A scub after the initial panic 
> did not report any errors.
>
> Second, these two servers had an extra anomaly, and that was the 
> missing zpool.cache. I say missing, because zdb was unable to access 
> the zpool, it was not until I ran "zpool set 
> cachefile=/boot/zfs/zpool.cache <pool>". This was previously not an 
> issue.
>
> The two servers were upgraded fro 9.1 to 10 on the same morning, 
> within minutes of each other. That is about it as far as 
> commonalities. Both have different drive types (900GB SAS vs 2TB 
> SATA), different controllers (Dell PERC (mfi) vs LSI (mps)), Dell vs 
> SuperMicro boards...
>
> We do use the aio kernel module, and as well as some sysctl and 
> loader.conf tuning. I've backed all of those out, so we're just 
> running a stock OS.
>
> Ideally, I would like to never run into this situation again. However, 
> I don't have any evidence to point to an upgrade misstep or some 
> catastrophic configuration error (kernel parameters, zpool create).
>
>
> Thank everyone,
> Mike C



[-- Attachment #2 --]
0	*H
010	+0	*H
"00e3v=0
	*H
0K10
URootCA10U

Bay Photo Lab10U
California10	UUS0
121023173218Z
271023173218Z0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUS0"0
	*H
0
;TąuyK~Zz2M'4
EiTj)yL5"kv7Urn \!SgP;zh>ˊj\VovX<LgfxxkL1CdY\S;z(5TO[)5bu\mBj*
nUh&`Qί;ZxȜF
ԧ@})8}4#dzw&P^=AdT}*4 qS^E)̈cA$XDS]Z/_5M`~ӻRo'Ftw\e.G.3@m"\,c{'Gidv(TQY9zbpˆ9c#Y³Vs|If	ew7I%Grau07hf.;{Jʾx/R1.LT}Կ!kb
o8H	]=}SΈ퉃00U-rfbb,v0U00U#0FNqi$x'{(W+0U00}{http://bayca.bayhoto.local/ejbca/publicweb/webdist/certdist?cmd=crl&issuer=CN=RootCA,O=Bay%20Photo%20Lab,ST=California,C=US0U0
	*H
JMUZ>7gm[z }/.~^J;өƉ-Q_\Όh޲#ԾXL7ph(@`+8W&ib!Qj+ȡ1iT(#^( giZ9c<R꼓e.ݘVѬ峿ۅ8Dh$~mm啠~'\ET&	 a}rMKL0u%HYL
l=`Υ3k[؝Y}$ ss8?~IXKda<==mL[RҠsHBR/*`JfUzA)'0JkArvp#e-{]U
Z`#2Ϡv~.#l7"D=&t^-Q_9Mi
uԒn{Zn!U%r3J;Q׼Di@PNg]&;yw|9B*.L=Ij-)/]'g^U0#0b=0
	*H
0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUS0
121023180003Z
141023180003Z0`10
	&,d1306910UMike Carlson10	UIT10U

Bay Photo Lab10	UUS0"0
	*H
0
<ȼ^|=e9KtФ-jI_	%[߲'O%3;=*n((RT	•͐C/\WU@HCjrIU-iE˼|paҨm-4݈amƵbK$"UEkEzd
w.
wG u:B'9!?tdk%%̞N.8C1ަί[
BjF0){C9&pXnĉZuX")3zsS\\D:L׏1Q}1Gzz(d#V3fRoш^CLfQ@S/StX
d5Y3M0ՙQ5ō;pIdV]&d#26zsgM}r#iМ|3)md:}뚁00R+F0D0B+06http://bayca.bayhoto.local/ejbca/publicweb/status/ocsp0U(}awJ״(#0U00U#0-rfbb,v0U00U%0
+0U0mike@bayphoto.com0
	*H
9|&V,*Hd	ƏA~6fFg'^y
I'yy,v}Z	@ᔘ7\F5QA37*LT4VStTe .Dӧ=n}=L\E	{
z7kYs#RO}E`OnL'1M0`Dۋ
rvVuX?s=	+O0:yE?BA̡5|Ʀpp*<FLA36k덝j9b=&)KJSmʐXo@g;V4@ujkX9	@Wh#nl\Y)A
rFGjqtvhu.ճK)L}@41AKz&ȴztÈ6͢j=0*+@;xnc-
WƣLG9X)=
y%]Q@BW
,Άut00MiC0
	*H
0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUS0
121023175745Z
141023175745Z0`10
	&,d1306910UMike Carlson10	UIT10U

Bay Photo Lab10	UUS0"0
	*H
0
@vɌAļVAW5:eh$n>b%k7Pwޡ=^CBv2ULLqn6+>A:P#=ѕ[8Z<|&wb(x椉
iҒx9H?~Ɔ-y]jN崡1geAˇwH4w?h!/^Pؕfa5-+%<*/+`ZBCƀno|6'zoe!)@H藱$zѩ+
SXDz(~Bݬe?V
\j;.P,銉[JݦkjY*nȡ5]
hlkz3.Wme/tɧ#	8L%
Ũ%zp	_p)ڜ(C=MYe3S>Tfρ=@	]ڑav&0ۗ;.j'Yk_00R+F0D0B+06http://bayca.bayhoto.local/ejbca/publicweb/status/ocsp0UFO+Rdb`?60U00U#0-rfbb,v0U00http://bayca.bayhoto.local/ejbca/publicweb/webdist/certdist?cmd=crl&issuer=CN=Bay%20Photo%20People%20CA,O=Bay%20Photo%20Lab,ST=California,C=US0U0U%0++0U0mike@bayphoto.com0
	*H
/ungfsy@KLw.cM&6?-Y4++IJYD	C£S_2$eڏPU((̖S~aM0ri~jk2Ւ[n9rn&Bz(MݼIܪ*ȱImu5lr[Q`3͈;l{Z07h$>at)qo\]pJW7*[c%

y1FB)p2͞[~=?!Wd9XY5.bOKUDV[Z98E
^X9n<Hi@C?H+jlۗc݌&yqQ<Ii/
ɣ*B!f<.Re-=Y*?-4;|vj1@+Iܑ=J7%'jMmrSM@GV|:C'ݮ_Lkt61F0B0d0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUSMiC0	+0	*H
	1	*H
0	*H
	1
140604202503Z0#	*H
	1lET4ӱ<0l	*H
	1_0]0	`He*0	`He0
*H
0*H
0
*H
@0+0
*H
(0s	+71f0d0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUSb=0u*H
	1fd0X10UBay Photo People CA10U

Bay Photo Lab10U
California10	UUSb=0
	*H
RwaY
p}}+:FɻdUG)J¸q%g0gX)RWG2j3BB(&P0@^3keaZi# q+g!όv*}u٭,xƞEcߗlC$~4džAR8o!@XWyԠry'}=J.YS
?߸l諕BdpWhb$],{/(]n-8;^_3c=XمI՞b9ijyQIncfȴ'AH
ԾaczwOVZ2'ٌMrf*SgENuƅ5bLv?03Qc
E@/?}PkOրTFaX02E:$q%տtA9g˦h*5¿G5>eBs
ҵ4,rHd>wcOgX2h-<a-簵R'sr
home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?538F809F.5090705>