Date: Mon, 23 Jan 2012 18:33:03 +0000 From: Martin Ranne <martin.ranne@kockumsonics.com> To: Andriy Gapon <avg@FreeBSD.org> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.org> Subject: RE: zpool import reboots computer Message-ID: <39C592E81AEC0B418EAD826FC1BBB09B25607F@mailgate> In-Reply-To: <4F1D75CD.6050000@FreeBSD.org> References: <39C592E81AEC0B418EAD826FC1BBB09B25031D@mailgate> <4F18459F.7040309@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B252444@mailgate> <4F1858FE.7020509@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25253F@mailgate> <4F1878AC.6060704@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25284B@mailgate> <4F1AC995.7050506@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B255E15@mailgate> <4F1D75CD.6050000@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
>On 2012-01-23 15:59, Andriy Gapon wrote:=20 >>on 23/01/2012 16:38 Martin Ranne said the following: >>>To me it looks like in the vdev_mirror_child_select function mc->mc_vd c= ould be >>>NULL although the code doesn't expect it. You can add some code to the = function >>>to check if the hypothesis is correct and to skip a loop if mc->mc_vd is= NULL. >>>Such a hack is probably not needed in general, but given that your pool = could be >>>corrupted, this could be your chance to get access to it. >>>BTW, restoring from backups is what is usually recommended first in a si= tuation >>>like this. >>I know it would be recommended first to restore from backup but there wer= e backup failures. >>Am back after the weekend. I have done the hack in vdev_mirror_child_sele= ct function as per the code below. >>if (mc->mc_tried || mc->mc_skipped) >> continue; >># hack start >>if (mc->mc_vd =3D=3D NULL) >> break; >># hack end >>if (!vdev_readable(mc->mc_vd)) { >>I am not getting the fault virtual address at 0x38 and 0x88 but instead g= et two at 0x88. The function it stops at is zio_vdev_child_io. Is there ano= ther hack i could do there? >You could try a similar hack in vdev_mirror_io_start(). >Please note that there are two loops in there. >BTW, if you run kgdb /path/to/kernel/that/paniced, you can do e.g. 'info l= ine >*zio_vdev_child_io+0x25" to see on what line the trap occurred. >I have now tried with the hack in vdev_mirror_io_start() like below and th= e one i previously did in vdev_mirror_child_select(). Unfortunately I get t= he same crash as i sent earlier today. It takes time to get into DDB for a = crash as >the computer freezes 19/20 times when i do the zpool import and i= f i try to save a dump, the comptuer freezes so I can not use that. Have done some checking and found mc->mc_vd =3D=3D NULL in the function vde= v_mirror_io_start() where the while-loop is.=20 while (children--) {=20 mc =3D &mm->mm_child[c]; zio_nowait(zio_vdev_child_io(zio, zio->io_bp, mc->mc_vd, mc->mc_offset, zio->io_data, zio->io_size, zio->io_type, zio->io_priority, 0, vdev_mirror_child_done, mc)); c++; } if i set a break before it runs zio_nowait() it will still crash the kernel= .=20 What can i check next for it to be able to continue? Is it possible to have= it ignore the child where mc_vd is NULL? I am also looking into what more = I can do to debug it (adding code to print to console as i can not use kern= el dumps). >>Crash and bt below. >>Fatal trap 12: page fault while in kernel mode >>cpuid =3D 1; >>apic id =3D 01 >>Fatal trap 12: page fault while in kernel mode >>fault virtual address =3D 0x88 >>cpuid =3D 5; fault code =3D supervisor read data, page not pres= ent >>apic id =3D 05 >>instruction pointer =3D 0x20:0xffffffff814a7ee5 >>fault virtual address =3D 0x88 >>stack pointer =3D 0x28:0xffffff8c0d564f00 >>fault code =3D supervisor read data, page not present >>frame pointer =3D 0x28:0xffffff8c0d564f70 >>instruction pointer =3D 0x20:0xffffffff814a7ee5 >>code segment =3D base 0x0, limit 0xfffff, type 0x1b >>stack pointer =3D 0x28:0xffffff8c1009aad0 >> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >>frame pointer =3D 0x28:0xffffff8c1009ab40 >>processor eflags =3D code segment =3D base 0x0, limit 0xf= ffff, type 0x1b >>interrupt enabled, =3D DPL 0, pres 1, long 1, def32 = 0, gran 1 >>resume, processor eflags =3D IOPL =3D 0 >>interrupt enabled, current process =3D resume, 0 (system_tas= kq_3) >>I[ thread pid 0 tid 100099 ] >>Stopped at zio_vdev_child_io+0x25: cmpq $0, 0x88(%r10) >>db> bt >>Tracing pid 0 tid 100099 td 0xfffffe000ee4e460 >>zio_vdev_child_io() at zio_vdev_child_io+0x25 >>vdev_mirror_io_start() at vdev_mirror_io_start+0x16c >>zio_vdev_io_start() at zio_vdev_io_start+0x232 >>zio_execute() at zio_execute+0xc3 >>zio_gang_assemble() at zio_gang_assemble+0x1b >>zio_execute() at zio_execute+0xc3 >>arc_read_nolock() at arc_read_nolock+0x6d1 >>arc_read() at arc_read+0x93 >>traverse_prefetcher() at traverse_prefetcher+0x103 >>traverse_visitbp() at traverse_visitbp+0x21c >>traverse_dnode() at traverse_dnode+0x7c >>traverse_visitbp() at traverse_visitbp+0x3ff >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_dnode() at traverse_dnode+0x7c >>traverse_visitbp() at traverse_visitbp+0x48c >>traverse_prefetch_thread() at traverse_prefetch_thread+0x78 >>taskq_run() at taskq_run+0x13 >>taskqueue_run_locked() at taskqueue_run_locked+0x85 >>taskqueue_thread_loop() at taskqueue_thread_loop+0x46 >>fork_exit() at fork_exit+0x11f >>fork_trampoline() at fork_trampoline+0xe >>--- trap 0, rip =3D 0, rsp =3D 0xffffff8c0d565d00, rbp =3D 0 --- >>db> >> >> >>//Martin Ranne ________________________________________ ________________________________________ No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1901 / Virus Database: 2109/4761 - Release Date: 01/23/12
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39C592E81AEC0B418EAD826FC1BBB09B25607F>