Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Jan 2012 14:38:28 +0000
From:      Martin Ranne <martin.ranne@kockumsonics.com>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.org>
Subject:   RE: zpool import reboots computer
Message-ID:  <39C592E81AEC0B418EAD826FC1BBB09B255E15@mailgate>
In-Reply-To: <4F1AC995.7050506@FreeBSD.org>
References:  <39C592E81AEC0B418EAD826FC1BBB09B25031D@mailgate> <4F18459F.7040309@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B252444@mailgate> <4F1858FE.7020509@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25253F@mailgate> <4F1878AC.6060704@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25284B@mailgate> <4F1AC995.7050506@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
>On 2012-01-21 15:20, Andriy Gapon wrote:=20
>>on 20/01/2012 11:09 Martin Ranne said the following:
>>I tried again to get into the debugger. It will not always work as it fre=
ezes before i get to the prompt most of the times but here it is. Any other=
 commands to run in the debugger to get better information to help solve th=
is?

>>I used the command zpool import -F -f -o readonly=3Don -R /mnt/serv06 zro=
ot

>>Result is the following
>>Fatal trap 12: page fault while in kernel mode
>>Fatal trap 12: page fault while in kernel mode
>>cpuid =3D 0; cpuid =3D 5; apic id =3D 00
>>apic id =3D 05
>>fault virtual address   =3D 0x38
>>fault virtual address   =3D 0x88
>>fault code              =3D supervisor read data, page not present
>>fault code              =3D supervisor read data, page not present
>>instruction pointer     =3D 0x20:0xffffffff814872a1
>>instruction pointer     =3D 0x20:0xffffffff814a7ef5
>>stack pointer           =3D 0x28:0xffffff8c0d564f00
>>stack pointer           =3D 0x28:0xffffff8c0ffd7ad0
>>frame pointer           =3D 0x28:0xffffff8c0d564f30
>>frame pointer           =3D 0x28:0xffffff8c0ffd7b40
>>code segment            =3D base 0x0, limit 0xfffff, type 0x1b
>>code segment            =3D base 0x0, limit 0xfffff, type 0x1b
>>                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
>>                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
>>processor eflags        =3D processor eflags      =3D interrupt enabled, =
>>interrupt enabled, resume, resume, IOPL =3D 0
>>IOPL =3D 0
>>current process         =3D current process               =3D 0 (system_t=
ask1_3)
>>26[ thread pid 0 tid 100099 ]
>>Stopped at      vdev_is_dead+0x1:       cmpq    $0x5,0x28(%rdi)
>>db> bt
>>Tracing pid 0 tid 100099 td 0xfffffe000e546460
>>vdev_is_dead() at vdev_is_dead+0x1
>>vdev_mirror_child_select() at vdev_mirror_child_select+0x67
>>vdev_mirror_io_start() at vdev_mirror_io_start+0x24c
>>zio_vdev_io_start() at zio_vdev_io_start+0x232
>>zio_execute() at zio_execute+0xc3
>>zio_gang_assemble() at zio_gang_assemble+0x1b
>>zio_execute() at zio_execute+0xc3
>>arc_read_nolock() at arc_read_nolock+0x6d1
>>arc_read() at arc_read+0x93
>>traverse_prefetcher() at traverse_prefetcher+0x103
>>traverse_visitbp() at traverse_visitbp+0x21c
>>traverse_dnode() at traverse_dnode+0x7c
>>traverse_visitbp() at traverse_visitbp+0x3ff
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_dnode() at traverse_dnode+0x7c
>>traverse_visitbp() at traverse_visitbp+0x48c
>>traverse_prefetch_thread() at traverse_prefetch_thread+0x78
>>taskq_run() at taskq_run+0x13
>>taskqueue_run_locked() at taskqueue_run_locked+0x85
>>taskqueue_thread_loop() at taskqueue_thread_loop+0x46
>>fork_exit() at fork_exit+0x11f
>>fork_trampoline() at fork_trampoline+0xe
>>--- trap 0, rip =3D 0, rsp =3D 0xffffff8c0d565d00, rbp =3D 0 ---
>>db>
>>
>
>To me it looks like in the vdev_mirror_child_select function mc->mc_vd cou=
ld be
>NULL although the code doesn't expect it.  You can add some code to the fu=
nction
>to check if the hypothesis is correct and to skip a loop if mc->mc_vd is N=
ULL.
>Such a hack is probably not needed in general, but given that your pool co=
uld be
>corrupted, this could be your chance to get access to it.
>
>BTW, restoring from backups is what is usually recommended first in a situ=
ation
>like this.
>

I know it would be recommended first to restore from backup but there were =
backup failures.

Am back after the weekend. I have done the hack in vdev_mirror_child_select=
 function as per the code below.
if (mc->mc_tried || mc->mc_skipped)
        continue;
# hack start
if (mc->mc_vd =3D=3D NULL)
        break;
# hack end
if (!vdev_readable(mc->mc_vd)) {
I am not getting the fault virtual address at 0x38 and 0x88 but instead get=
 two at 0x88. The function it stops at is zio_vdev_child_io. Is there anoth=
er hack i could do there?

Crash and bt below.
Fatal trap 12: page fault while in kernel mode
cpuid =3D 1;
apic id =3D 01
Fatal trap 12: page fault while in kernel mode
fault virtual address   =3D 0x88
cpuid =3D 5; fault code           =3D supervisor read data, page not presen=
t
apic id =3D 05
instruction pointer     =3D 0x20:0xffffffff814a7ee5
fault virtual address   =3D 0x88
stack pointer           =3D 0x28:0xffffff8c0d564f00
fault code              =3D supervisor read data, page not present
frame pointer           =3D 0x28:0xffffff8c0d564f70
instruction pointer     =3D 0x20:0xffffffff814a7ee5
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
stack pointer           =3D 0x28:0xffffff8c1009aad0
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
frame pointer           =3D 0x28:0xffffff8c1009ab40
processor eflags        =3D code segment          =3D base 0x0, limit 0xfff=
ff, type 0x1b
interrupt enabled,                      =3D DPL 0, pres 1, long 1, def32 0,=
 gran 1
resume, processor eflags        =3D IOPL =3D 0
interrupt enabled, current process              =3D resume, 0 (system_taskq=
_3)
I[ thread pid 0 tid 100099 ]
Stopped at      zio_vdev_child_io+0x25: cmpq    $0, 0x88(%r10)
db> bt
Tracing pid 0 tid 100099 td 0xfffffe000ee4e460
zio_vdev_child_io() at zio_vdev_child_io+0x25
vdev_mirror_io_start() at vdev_mirror_io_start+0x16c
zio_vdev_io_start() at zio_vdev_io_start+0x232
zio_execute() at zio_execute+0xc3
zio_gang_assemble() at zio_gang_assemble+0x1b
zio_execute() at zio_execute+0xc3
arc_read_nolock() at arc_read_nolock+0x6d1
arc_read() at arc_read+0x93
traverse_prefetcher() at traverse_prefetcher+0x103
traverse_visitbp() at traverse_visitbp+0x21c
traverse_dnode() at traverse_dnode+0x7c
traverse_visitbp() at traverse_visitbp+0x3ff
traverse_visitbp() at traverse_visitbp+0x316
traverse_visitbp() at traverse_visitbp+0x316
traverse_visitbp() at traverse_visitbp+0x316
traverse_visitbp() at traverse_visitbp+0x316
traverse_visitbp() at traverse_visitbp+0x316
traverse_visitbp() at traverse_visitbp+0x316
traverse_dnode() at traverse_dnode+0x7c
traverse_visitbp() at traverse_visitbp+0x48c
traverse_prefetch_thread() at traverse_prefetch_thread+0x78
taskq_run() at taskq_run+0x13
taskqueue_run_locked() at taskqueue_run_locked+0x85
taskqueue_thread_loop() at taskqueue_thread_loop+0x46
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip =3D 0, rsp =3D 0xffffff8c0d565d00, rbp =3D 0 ---
db>


//Martin Ranne
________________________________________
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1901 / Virus Database: 2109/4760 - Release Date: 01/22/12



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39C592E81AEC0B418EAD826FC1BBB09B255E15>