From owner-freebsd-fs@FreeBSD.ORG Mon Jan 23 14:59:29 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ADB72106566B for ; Mon, 23 Jan 2012 14:59:29 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 12FBD8FC17 for ; Mon, 23 Jan 2012 14:59:28 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA05873; Mon, 23 Jan 2012 16:59:25 +0200 (EET) (envelope-from avg@FreeBSD.org) Message-ID: <4F1D75CD.6050000@FreeBSD.org> Date: Mon, 23 Jan 2012 16:59:25 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120111 Thunderbird/9.0 MIME-Version: 1.0 To: Martin Ranne References: <39C592E81AEC0B418EAD826FC1BBB09B25031D@mailgate> <4F18459F.7040309@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B252444@mailgate> <4F1858FE.7020509@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25253F@mailgate> <4F1878AC.6060704@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25284B@mailgate> <4F1AC995.7050506@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B255E15@mailgate> In-Reply-To: <39C592E81AEC0B418EAD826FC1BBB09B255E15@mailgate> X-Enigmail-Version: undefined Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: "freebsd-fs@freebsd.org" Subject: Re: zpool import reboots computer X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Jan 2012 14:59:29 -0000 on 23/01/2012 16:38 Martin Ranne said the following: >> On 2012-01-21 15:20, Andriy Gapon wrote: >> To me it looks like in the vdev_mirror_child_select function mc->mc_vd could be >> NULL although the code doesn't expect it. You can add some code to the function >> to check if the hypothesis is correct and to skip a loop if mc->mc_vd is NULL. >> Such a hack is probably not needed in general, but given that your pool could be >> corrupted, this could be your chance to get access to it. >> >> BTW, restoring from backups is what is usually recommended first in a situation >> like this. >> > > I know it would be recommended first to restore from backup but there were backup failures. > > Am back after the weekend. I have done the hack in vdev_mirror_child_select function as per the code below. > if (mc->mc_tried || mc->mc_skipped) > continue; > # hack start > if (mc->mc_vd == NULL) > break; > # hack end > if (!vdev_readable(mc->mc_vd)) { > I am not getting the fault virtual address at 0x38 and 0x88 but instead get two at 0x88. The function it stops at is zio_vdev_child_io. Is there another hack i could do there? You could try a similar hack in vdev_mirror_io_start(). Please note that there are two loops in there. BTW, if you run kgdb /path/to/kernel/that/paniced, you can do e.g. 'info line *zio_vdev_child_io+0x25" to see on what line the trap occurred. > Crash and bt below. > Fatal trap 12: page fault while in kernel mode > cpuid = 1; > apic id = 01 > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x88 > cpuid = 5; fault code = supervisor read data, page not present > apic id = 05 > instruction pointer = 0x20:0xffffffff814a7ee5 > fault virtual address = 0x88 > stack pointer = 0x28:0xffffff8c0d564f00 > fault code = supervisor read data, page not present > frame pointer = 0x28:0xffffff8c0d564f70 > instruction pointer = 0x20:0xffffffff814a7ee5 > code segment = base 0x0, limit 0xfffff, type 0x1b > stack pointer = 0x28:0xffffff8c1009aad0 > = DPL 0, pres 1, long 1, def32 0, gran 1 > frame pointer = 0x28:0xffffff8c1009ab40 > processor eflags = code segment = base 0x0, limit 0xfffff, type 0x1b > interrupt enabled, = DPL 0, pres 1, long 1, def32 0, gran 1 > resume, processor eflags = IOPL = 0 > interrupt enabled, current process = resume, 0 (system_taskq_3) > I[ thread pid 0 tid 100099 ] > Stopped at zio_vdev_child_io+0x25: cmpq $0, 0x88(%r10) > db> bt > Tracing pid 0 tid 100099 td 0xfffffe000ee4e460 > zio_vdev_child_io() at zio_vdev_child_io+0x25 > vdev_mirror_io_start() at vdev_mirror_io_start+0x16c > zio_vdev_io_start() at zio_vdev_io_start+0x232 > zio_execute() at zio_execute+0xc3 > zio_gang_assemble() at zio_gang_assemble+0x1b > zio_execute() at zio_execute+0xc3 > arc_read_nolock() at arc_read_nolock+0x6d1 > arc_read() at arc_read+0x93 > traverse_prefetcher() at traverse_prefetcher+0x103 > traverse_visitbp() at traverse_visitbp+0x21c > traverse_dnode() at traverse_dnode+0x7c > traverse_visitbp() at traverse_visitbp+0x3ff > traverse_visitbp() at traverse_visitbp+0x316 > traverse_visitbp() at traverse_visitbp+0x316 > traverse_visitbp() at traverse_visitbp+0x316 > traverse_visitbp() at traverse_visitbp+0x316 > traverse_visitbp() at traverse_visitbp+0x316 > traverse_visitbp() at traverse_visitbp+0x316 > traverse_dnode() at traverse_dnode+0x7c > traverse_visitbp() at traverse_visitbp+0x48c > traverse_prefetch_thread() at traverse_prefetch_thread+0x78 > taskq_run() at taskq_run+0x13 > taskqueue_run_locked() at taskqueue_run_locked+0x85 > taskqueue_thread_loop() at taskqueue_thread_loop+0x46 > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffff8c0d565d00, rbp = 0 --- > db> > > > //Martin Ranne > ________________________________________ > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2012.0.1901 / Virus Database: 2109/4760 - Release Date: 01/22/12 -- Andriy Gapon