Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Jan 2012 18:33:03 +0000
From:      Martin Ranne <martin.ranne@kockumsonics.com>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.org>
Subject:   RE: zpool import reboots computer
Message-ID:  <39C592E81AEC0B418EAD826FC1BBB09B25607F@mailgate>
In-Reply-To: <4F1D75CD.6050000@FreeBSD.org>
References:  <39C592E81AEC0B418EAD826FC1BBB09B25031D@mailgate> <4F18459F.7040309@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B252444@mailgate> <4F1858FE.7020509@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25253F@mailgate> <4F1878AC.6060704@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25284B@mailgate> <4F1AC995.7050506@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B255E15@mailgate> <4F1D75CD.6050000@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
>On 2012-01-23 15:59, Andriy Gapon wrote:=20
>>on 23/01/2012 16:38 Martin Ranne said the following:
>>>To me it looks like in the vdev_mirror_child_select function mc->mc_vd c=
ould be
>>>NULL although the code doesn't expect it.  You can add some code to the =
function
>>>to check if the hypothesis is correct and to skip a loop if mc->mc_vd is=
 NULL.
>>>Such a hack is probably not needed in general, but given that your pool =
could be
>>>corrupted, this could be your chance to get access to it.

>>>BTW, restoring from backups is what is usually recommended first in a si=
tuation
>>>like this.

>>I know it would be recommended first to restore from backup but there wer=
e backup failures.

>>Am back after the weekend. I have done the hack in vdev_mirror_child_sele=
ct function as per the code below.
>>if (mc->mc_tried || mc->mc_skipped)
>>        continue;
>># hack start
>>if (mc->mc_vd =3D=3D NULL)
>>        break;
>># hack end
>>if (!vdev_readable(mc->mc_vd)) {
>>I am not getting the fault virtual address at 0x38 and 0x88 but instead g=
et two at 0x88. The function it stops at is zio_vdev_child_io. Is there ano=
ther hack i could do there?
>You could try a similar hack in vdev_mirror_io_start().
>Please note that there are two loops in there.

>BTW, if you run kgdb /path/to/kernel/that/paniced, you can do e.g. 'info l=
ine
>*zio_vdev_child_io+0x25" to see on what line the trap occurred.
>I have now tried with the hack in vdev_mirror_io_start() like below and th=
e one i previously did in vdev_mirror_child_select(). Unfortunately I get t=
he same crash as i sent earlier today. It takes time to get into DDB for a =
crash as >the computer freezes 19/20 times when i do the zpool import and i=
f i try to save a dump, the comptuer freezes so I can not use that.

Have done some checking and found mc->mc_vd =3D=3D NULL in the function vde=
v_mirror_io_start() where the while-loop is.=20

while (children--) {=20
    mc =3D &mm->mm_child[c];
    zio_nowait(zio_vdev_child_io(zio, zio->io_bp,
        mc->mc_vd, mc->mc_offset, zio->io_data, zio->io_size,
        zio->io_type, zio->io_priority, 0,
        vdev_mirror_child_done, mc));
    c++;
}

if i set a break before it runs zio_nowait() it will still crash the kernel=
.=20
What can i check next for it to be able to continue? Is it possible to have=
 it ignore the child where mc_vd is NULL? I am also looking into what more =
I can do to debug it (adding code to print to console as i can not use kern=
el dumps).


>>Crash and bt below.
>>Fatal trap 12: page fault while in kernel mode
>>cpuid =3D 1;
>>apic id =3D 01
>>Fatal trap 12: page fault while in kernel mode
>>fault virtual address   =3D 0x88
>>cpuid =3D 5; fault code           =3D supervisor read data, page not pres=
ent
>>apic id =3D 05
>>instruction pointer     =3D 0x20:0xffffffff814a7ee5
>>fault virtual address   =3D 0x88
>>stack pointer           =3D 0x28:0xffffff8c0d564f00
>>fault code              =3D supervisor read data, page not present
>>frame pointer           =3D 0x28:0xffffff8c0d564f70
>>instruction pointer     =3D 0x20:0xffffffff814a7ee5
>>code segment            =3D base 0x0, limit 0xfffff, type 0x1b
>>stack pointer           =3D 0x28:0xffffff8c1009aad0
>>                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
>>frame pointer           =3D 0x28:0xffffff8c1009ab40
>>processor eflags        =3D code segment          =3D base 0x0, limit 0xf=
ffff, type 0x1b
>>interrupt enabled,                      =3D DPL 0, pres 1, long 1, def32 =
0, gran 1
>>resume, processor eflags        =3D IOPL =3D 0
>>interrupt enabled, current process              =3D resume, 0 (system_tas=
kq_3)
>>I[ thread pid 0 tid 100099 ]
>>Stopped at      zio_vdev_child_io+0x25: cmpq    $0, 0x88(%r10)
>>db> bt
>>Tracing pid 0 tid 100099 td 0xfffffe000ee4e460
>>zio_vdev_child_io() at zio_vdev_child_io+0x25
>>vdev_mirror_io_start() at vdev_mirror_io_start+0x16c
>>zio_vdev_io_start() at zio_vdev_io_start+0x232
>>zio_execute() at zio_execute+0xc3
>>zio_gang_assemble() at zio_gang_assemble+0x1b
>>zio_execute() at zio_execute+0xc3
>>arc_read_nolock() at arc_read_nolock+0x6d1
>>arc_read() at arc_read+0x93
>>traverse_prefetcher() at traverse_prefetcher+0x103
>>traverse_visitbp() at traverse_visitbp+0x21c
>>traverse_dnode() at traverse_dnode+0x7c
>>traverse_visitbp() at traverse_visitbp+0x3ff
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_dnode() at traverse_dnode+0x7c
>>traverse_visitbp() at traverse_visitbp+0x48c
>>traverse_prefetch_thread() at traverse_prefetch_thread+0x78
>>taskq_run() at taskq_run+0x13
>>taskqueue_run_locked() at taskqueue_run_locked+0x85
>>taskqueue_thread_loop() at taskqueue_thread_loop+0x46
>>fork_exit() at fork_exit+0x11f
>>fork_trampoline() at fork_trampoline+0xe
>>--- trap 0, rip =3D 0, rsp =3D 0xffffff8c0d565d00, rbp =3D 0 ---
>>db>
>>
>>
>>//Martin Ranne
________________________________________
________________________________________
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1901 / Virus Database: 2109/4761 - Release Date: 01/23/12



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39C592E81AEC0B418EAD826FC1BBB09B25607F>