Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 May 2012 19:52:25 +0200
From:      Frank Bartels <freebsd@knarf.de>
To:        freebsd-fs@freebsd.org
Cc:        'Andriy Gapon' <avg@freebsd.org>
Subject:   Re: zpool import reboots computer
Message-ID:  <20120518175225.GA4735@server-king.de>
In-Reply-To: <39C592E81AEC0B418EAD826FC1BBB09B25CF08@mailgate>
References:  <4F1858FE.7020509@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25253F@mailgate> <4F1878AC.6060704@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25284B@mailgate> <4F1AC995.7050506@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B255E15@mailgate> <4F1D75CD.6050000@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25607F@mailgate> <4F1DC398.3050502@FreeBSD.org> <39C592E81AEC0B418EAD826FC1BBB09B25CF08@mailgate>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi freebsd-fs,

I have a similar problem like Martin.

It started a while ago with a broken zfs, I was no longer able to
delete some files on /home/ncvs:

Checking setuid files and devices:
find: /home/ncvs/del/efax/Attic/pkg-comment,v: Bad file descriptor
find: /home/ncvs/del/libsyncml/files: No such file or directory

Two days ago the machine started rebooting every two hours, directly
after syncing my local cvsup-server.

So I renamed the zfs /home/ncvs to /home/ncvs.del and tried to
destroy it including its snapshots. The machine crashed again and
now I'm unable to import the pool.

First I've seen this backtrace:

https://www.server-king.de/download/DSC02742.medium.JPG

Then I've added the three blocks above to vdev_mirror.c. It still
crashes, but the backtrace has changed:

https://www.server-king.de/download/DSC02744.medium.JPG

...
calltrap
zio_checksum_verify
zio_execute
arc_read_nolock
arc_read
...

This is FreeBSD 8.3-RELEASE-p1 amd64 on a Xeon X5650 with 24 GByte
RAM and 12 hard disks and 2 SSDs.

This is what I see with zpool import -d /dev/gpt

   pool: zdata
     id: 18141461787395278116
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        zdata               ONLINE
          raidz2-0          ONLINE
            gpt/zdata1.eli  ONLINE
            gpt/zdata0.eli  ONLINE
            gpt/zdata2.eli  ONLINE
            gpt/zdata3.eli  ONLINE
            gpt/zdata5.eli  ONLINE
            gpt/zdata4.eli  ONLINE
            gpt/zdata6.eli  ONLINE
            gpt/zdata8.eli  ONLINE
            gpt/zdata9.eli  ONLINE
        cache
          gpt/zcache0.eli
          gpt/zcache1.eli
        spares
          gpt/zdata7.eli
        logs
          mirror-1          ONLINE
            gpt/zlog0.eli   ONLINE
            gpt/zlog1.eli   ONLINE

I have no idea why I don't see zcaches and zdata7 as ONLINE.

If I use zpool import (without -d) I see dsk/gpt instead of gpt/
on these three disks:

        cache
          dsk/gpt/zcache0.eli
          dsk/gpt/zcache1.eli
        spares
          dsk/gpt/zdata7.eli

Do you have any idea what I can do? I've tried 9.0-RELEASE (LiveCD)
without success. Do you think using 8.3-STABLE or 9.0-STABLE could
cure my problem?

Thanks,
Knarf

On Wed, Jan 25, 2012 at 16:10:19 +0000, Martin Ranne wrote:
> Thank you everyone who have helped me with hacking zfs. We have now been able to do an import of the pool and transfered all the data to another computer. Next step is to see if we can quickly repair the pool or just delete it and make it new again.
>
> We hacked the functions vdev_mirror_child_select() and vdev_mirror_io_start(). In vdev_mirror_io_start() we added the code below just after the mc pointer was set in both loops.
>
> if (mc->mc_vd == NULL) {
>     (void) printf("mc->mc_vd is NULL. Child %i\n", c);
>     continue;
> }
>
> In vdev_mirror_child_select(), we added the code below just after the mc pointer was set.
>
> if (mc->mc_vd == NULL) {
>     (void) printf("mc->mc_vd is NULL. Child %i\n", c);
>     mc->mc_tried = 1;
>     mc->mc_skipped = 1;
>     continue;
> }
>
>
> Best regards,
>
> Martin Ranne
>
> >On 2012-01-23 21:31, Andriy Gapon wrote:
> >>on 23/01/2012 20:33 Martin Ranne said the following:
> >>Have done some checking and found mc->mc_vd == NULL in the function vdev_mirror_io_start() where the while-loop is.
> >>
> >>while (children--) {
> >>    mc = &mm->mm_child[c];
> >>    zio_nowait(zio_vdev_child_io(zio, zio->io_bp,
> >>        mc->mc_vd, mc->mc_offset, zio->io_data, zio->io_size,
> >>        zio->io_type, zio->io_priority, 0,
> >>        vdev_mirror_child_done, mc));
> >>    c++;
> >>}
> >>
> >>if i set a break before it runs zio_nowait() it will still crash the kernel.
> >>What can i check next for it to be able to continue? Is it possible to have it ignore the child where mc_vd is NULL? I am also looking into what more I can do to debug it (adding code to print to console as i can not use kernel dumps).
> >>
> >Not sure.  If by "set a break" you mean inserting a break statement, try
> >continue instead.
> >
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120518175225.GA4735>