Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Jan 2012 16:59:25 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Martin Ranne <martin.ranne@kockumsonics.com>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.org>
Subject:   Re: zpool import reboots computer
Message-ID:  <4F1D75CD.6050000@FreeBSD.org>
In-Reply-To: <39C592E81AEC0B418EAD826FC1BBB09B255E15@mailgate>
References:  <39C592E81AEC0B418EAD826FC1BBB09B25031D@mailgate> <4F18459F.7040309@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B252444@mailgate> <4F1858FE.7020509@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25253F@mailgate> <4F1878AC.6060704@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25284B@mailgate> <4F1AC995.7050506@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B255E15@mailgate>

next in thread | previous in thread | raw e-mail | index | archive | help
on 23/01/2012 16:38 Martin Ranne said the following:
>> On 2012-01-21 15:20, Andriy Gapon wrote: 
>> To me it looks like in the vdev_mirror_child_select function mc->mc_vd could be
>> NULL although the code doesn't expect it.  You can add some code to the function
>> to check if the hypothesis is correct and to skip a loop if mc->mc_vd is NULL.
>> Such a hack is probably not needed in general, but given that your pool could be
>> corrupted, this could be your chance to get access to it.
>>
>> BTW, restoring from backups is what is usually recommended first in a situation
>> like this.
>>
> 
> I know it would be recommended first to restore from backup but there were backup failures.
> 
> Am back after the weekend. I have done the hack in vdev_mirror_child_select function as per the code below.
> if (mc->mc_tried || mc->mc_skipped)
>         continue;
> # hack start
> if (mc->mc_vd == NULL)
>         break;
> # hack end
> if (!vdev_readable(mc->mc_vd)) {
> I am not getting the fault virtual address at 0x38 and 0x88 but instead get two at 0x88. The function it stops at is zio_vdev_child_io. Is there another hack i could do there?

You could try a similar hack in vdev_mirror_io_start().
Please note that there are two loops in there.

BTW, if you run kgdb /path/to/kernel/that/paniced, you can do e.g. 'info line
*zio_vdev_child_io+0x25" to see on what line the trap occurred.

> Crash and bt below.
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1;
> apic id = 01
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x88
> cpuid = 5; fault code           = supervisor read data, page not present
> apic id = 05
> instruction pointer     = 0x20:0xffffffff814a7ee5
> fault virtual address   = 0x88
> stack pointer           = 0x28:0xffffff8c0d564f00
> fault code              = supervisor read data, page not present
> frame pointer           = 0x28:0xffffff8c0d564f70
> instruction pointer     = 0x20:0xffffffff814a7ee5
> code segment            = base 0x0, limit 0xfffff, type 0x1b
> stack pointer           = 0x28:0xffffff8c1009aad0
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> frame pointer           = 0x28:0xffffff8c1009ab40
> processor eflags        = code segment          = base 0x0, limit 0xfffff, type 0x1b
> interrupt enabled,                      = DPL 0, pres 1, long 1, def32 0, gran 1
> resume, processor eflags        = IOPL = 0
> interrupt enabled, current process              = resume, 0 (system_taskq_3)
> I[ thread pid 0 tid 100099 ]
> Stopped at      zio_vdev_child_io+0x25: cmpq    $0, 0x88(%r10)
> db> bt
> Tracing pid 0 tid 100099 td 0xfffffe000ee4e460
> zio_vdev_child_io() at zio_vdev_child_io+0x25
> vdev_mirror_io_start() at vdev_mirror_io_start+0x16c
> zio_vdev_io_start() at zio_vdev_io_start+0x232
> zio_execute() at zio_execute+0xc3
> zio_gang_assemble() at zio_gang_assemble+0x1b
> zio_execute() at zio_execute+0xc3
> arc_read_nolock() at arc_read_nolock+0x6d1
> arc_read() at arc_read+0x93
> traverse_prefetcher() at traverse_prefetcher+0x103
> traverse_visitbp() at traverse_visitbp+0x21c
> traverse_dnode() at traverse_dnode+0x7c
> traverse_visitbp() at traverse_visitbp+0x3ff
> traverse_visitbp() at traverse_visitbp+0x316
> traverse_visitbp() at traverse_visitbp+0x316
> traverse_visitbp() at traverse_visitbp+0x316
> traverse_visitbp() at traverse_visitbp+0x316
> traverse_visitbp() at traverse_visitbp+0x316
> traverse_visitbp() at traverse_visitbp+0x316
> traverse_dnode() at traverse_dnode+0x7c
> traverse_visitbp() at traverse_visitbp+0x48c
> traverse_prefetch_thread() at traverse_prefetch_thread+0x78
> taskq_run() at taskq_run+0x13
> taskqueue_run_locked() at taskqueue_run_locked+0x85
> taskqueue_thread_loop() at taskqueue_thread_loop+0x46
> fork_exit() at fork_exit+0x11f
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xffffff8c0d565d00, rbp = 0 ---
> db>
> 
> 
> //Martin Ranne
> ________________________________________
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2012.0.1901 / Virus Database: 2109/4760 - Release Date: 01/22/12


-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F1D75CD.6050000>