From owner-freebsd-fs@FreeBSD.ORG Mon Jan 23 18:33:07 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3C7C71065677; Mon, 23 Jan 2012 18:33:07 +0000 (UTC) (envelope-from martin.ranne@kockumsonics.com) Received: from webmail.kockumsonics.com (mail.kockumsonics.com [194.103.55.3]) by mx1.freebsd.org (Postfix) with ESMTP id 8C3528FC1A; Mon, 23 Jan 2012 18:33:05 +0000 (UTC) Received: from MAILGATE.sonet.local ([192.168.12.8]) by mailgate ([192.168.12.8]) with mapi id 14.01.0355.002; Mon, 23 Jan 2012 19:33:03 +0100 From: Martin Ranne To: Andriy Gapon Thread-Topic: zpool import reboots computer Thread-Index: AczWvHf/qf1tgj/cQ3aTdT164KORYwAAxbSAAARQzcD///SRAP//zVoQgABYagD//xWRYIADrTyA//zFgGAAzUT8gP//s7ww Date: Mon, 23 Jan 2012 18:33:03 +0000 Message-ID: <39C592E81AEC0B418EAD826FC1BBB09B25607F@mailgate> References: <39C592E81AEC0B418EAD826FC1BBB09B25031D@mailgate> <4F18459F.7040309@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B252444@mailgate> <4F1858FE.7020509@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25253F@mailgate> <4F1878AC.6060704@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25284B@mailgate> <4F1AC995.7050506@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B255E15@mailgate> <4F1D75CD.6050000@FreeBSD.org> In-Reply-To: <4F1D75CD.6050000@FreeBSD.org> Accept-Language: sv-SE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.15.6] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "freebsd-fs@freebsd.org" Subject: RE: zpool import reboots computer X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jan 2012 18:33:07 -0000 >On 2012-01-23 15:59, Andriy Gapon wrote:=20 >>on 23/01/2012 16:38 Martin Ranne said the following: >>>To me it looks like in the vdev_mirror_child_select function mc->mc_vd c= ould be >>>NULL although the code doesn't expect it. You can add some code to the = function >>>to check if the hypothesis is correct and to skip a loop if mc->mc_vd is= NULL. >>>Such a hack is probably not needed in general, but given that your pool = could be >>>corrupted, this could be your chance to get access to it. >>>BTW, restoring from backups is what is usually recommended first in a si= tuation >>>like this. >>I know it would be recommended first to restore from backup but there wer= e backup failures. >>Am back after the weekend. I have done the hack in vdev_mirror_child_sele= ct function as per the code below. >>if (mc->mc_tried || mc->mc_skipped) >> continue; >># hack start >>if (mc->mc_vd =3D=3D NULL) >> break; >># hack end >>if (!vdev_readable(mc->mc_vd)) { >>I am not getting the fault virtual address at 0x38 and 0x88 but instead g= et two at 0x88. The function it stops at is zio_vdev_child_io. Is there ano= ther hack i could do there? >You could try a similar hack in vdev_mirror_io_start(). >Please note that there are two loops in there. >BTW, if you run kgdb /path/to/kernel/that/paniced, you can do e.g. 'info l= ine >*zio_vdev_child_io+0x25" to see on what line the trap occurred. >I have now tried with the hack in vdev_mirror_io_start() like below and th= e one i previously did in vdev_mirror_child_select(). Unfortunately I get t= he same crash as i sent earlier today. It takes time to get into DDB for a = crash as >the computer freezes 19/20 times when i do the zpool import and i= f i try to save a dump, the comptuer freezes so I can not use that. Have done some checking and found mc->mc_vd =3D=3D NULL in the function vde= v_mirror_io_start() where the while-loop is.=20 while (children--) {=20 mc =3D &mm->mm_child[c]; zio_nowait(zio_vdev_child_io(zio, zio->io_bp, mc->mc_vd, mc->mc_offset, zio->io_data, zio->io_size, zio->io_type, zio->io_priority, 0, vdev_mirror_child_done, mc)); c++; } if i set a break before it runs zio_nowait() it will still crash the kernel= .=20 What can i check next for it to be able to continue? Is it possible to have= it ignore the child where mc_vd is NULL? I am also looking into what more = I can do to debug it (adding code to print to console as i can not use kern= el dumps). >>Crash and bt below. >>Fatal trap 12: page fault while in kernel mode >>cpuid =3D 1; >>apic id =3D 01 >>Fatal trap 12: page fault while in kernel mode >>fault virtual address =3D 0x88 >>cpuid =3D 5; fault code =3D supervisor read data, page not pres= ent >>apic id =3D 05 >>instruction pointer =3D 0x20:0xffffffff814a7ee5 >>fault virtual address =3D 0x88 >>stack pointer =3D 0x28:0xffffff8c0d564f00 >>fault code =3D supervisor read data, page not present >>frame pointer =3D 0x28:0xffffff8c0d564f70 >>instruction pointer =3D 0x20:0xffffffff814a7ee5 >>code segment =3D base 0x0, limit 0xfffff, type 0x1b >>stack pointer =3D 0x28:0xffffff8c1009aad0 >> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >>frame pointer =3D 0x28:0xffffff8c1009ab40 >>processor eflags =3D code segment =3D base 0x0, limit 0xf= ffff, type 0x1b >>interrupt enabled, =3D DPL 0, pres 1, long 1, def32 = 0, gran 1 >>resume, processor eflags =3D IOPL =3D 0 >>interrupt enabled, current process =3D resume, 0 (system_tas= kq_3) >>I[ thread pid 0 tid 100099 ] >>Stopped at zio_vdev_child_io+0x25: cmpq $0, 0x88(%r10) >>db> bt >>Tracing pid 0 tid 100099 td 0xfffffe000ee4e460 >>zio_vdev_child_io() at zio_vdev_child_io+0x25 >>vdev_mirror_io_start() at vdev_mirror_io_start+0x16c >>zio_vdev_io_start() at zio_vdev_io_start+0x232 >>zio_execute() at zio_execute+0xc3 >>zio_gang_assemble() at zio_gang_assemble+0x1b >>zio_execute() at zio_execute+0xc3 >>arc_read_nolock() at arc_read_nolock+0x6d1 >>arc_read() at arc_read+0x93 >>traverse_prefetcher() at traverse_prefetcher+0x103 >>traverse_visitbp() at traverse_visitbp+0x21c >>traverse_dnode() at traverse_dnode+0x7c >>traverse_visitbp() at traverse_visitbp+0x3ff >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_visitbp() at traverse_visitbp+0x316 >>traverse_dnode() at traverse_dnode+0x7c >>traverse_visitbp() at traverse_visitbp+0x48c >>traverse_prefetch_thread() at traverse_prefetch_thread+0x78 >>taskq_run() at taskq_run+0x13 >>taskqueue_run_locked() at taskqueue_run_locked+0x85 >>taskqueue_thread_loop() at taskqueue_thread_loop+0x46 >>fork_exit() at fork_exit+0x11f >>fork_trampoline() at fork_trampoline+0xe >>--- trap 0, rip =3D 0, rsp =3D 0xffffff8c0d565d00, rbp =3D 0 --- >>db> >> >> >>//Martin Ranne ________________________________________ ________________________________________ No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1901 / Virus Database: 2109/4761 - Release Date: 01/23/12