Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Dec 2015 17:09:33 +0100
From:      "Ranjan1018 ." <214748mv@gmail.com>
To:        Garrett Cooper <yaneurabeya@gmail.com>
Cc:        FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject:   Re: Panic at shutdown
Message-ID:  <CACyC=qY_dwoCeNA%2BDpN2CvOVWuX0tj6wA8zgXq2adHU0MF0FLA@mail.gmail.com>
In-Reply-To: <055E0877-533A-4378-A306-FDE511543243@gmail.com>
References:  <CACyC=qbD7dGnwmUqCA=3aVtMu5K19BC=-HjZTX4FjKbS2stkSg@mail.gmail.com> <055E0877-533A-4378-A306-FDE511543243@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2015-11-29 0:10 GMT+01:00 Garrett Cooper <yaneurabeya@gmail.com>:

>
> > On Nov 28, 2015, at 12:32, Ranjan1018 . <214748mv@gmail.com> wrote:
> >
> > Hi,
> >
> > sometimes I have the panic in the photo at shutdown:
> >
> > http://imgur.com/mXrgFLp
> >
> > Unfortunately this happens randomly.
> >
> > I am running:
> >
> > $ uname -a
> >
> > FreeBSD ativ 11.0-CURRENT FreeBSD 11.0-CURRENT #3 r291160M: Sun Nov 22
> > 17:10:38 CET 2015     root@ativ:/usr/obj/usr/src/sys/GENERIC  amd64
>
> The panic is in the ZFS code.
>
> Have you run memtest on the machine recently?
>

Good suggestion I have run memtest successfully for few hours on my laptop.

I have understood the panic cause: is an invalid offset.

The original function in
/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c is:

boolean_t
txg_list_member(txg_list_t *tl, void *p, uint64_t txg)
{
    int t = txg & TXG_MASK;
    txg_node_t *tn = (txg_node_t *)((char *)p + tl->tl_offset);

    return (tn->tn_member[t] != 0);
}

I have modified the function to print an uncommon or invalid tl->tl_offset :

boolean_t
txg_list_member(txg_list_t *tl, void *p, uint64_t txg)
{
    size_t ofs = tl->tl_offset;
    {
        static int cnt=0;
        if ( (cnt++ % 1000) == 0
            || (ofs != 88 && ofs != 984) )
            printf("**** %d) tl->tl_offset %zu\n", cnt, ofs);
    }

    txg_node_t *tn = (txg_node_t *)((char *)p + ofs);

    return (tn->tn_member[txg & TXG_MASK] != 0);
}

I have received the panic again with an invalid  tl->tl_offset of
16045693110842147038.
In /val/log/messages I have:

Dec  8 10:32:42 ativ kernel: Waiting (max 60 seconds) for system process
`vnlru' to stop...done
Dec  8 10:32:42 ativ kernel: Waiting (max 60 seconds) for system process
`bufdaemon' to stop...done
Dec  8 10:32:42 ativ kernel: Waiting (max 60 seconds) for system process
`syncer' to stop...
Dec  8 10:32:42 ativ kernel: Syncing disks, vnodes remaining...0 0 0 done
Dec  8 10:32:42 ativ kernel: All buffers synced.
Dec  8 10:32:42 ativ kernel: **** 9692) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: **** 9693) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: **** 9694) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: **** 9695) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: **** 9708) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: **** 9709) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: **** 9710) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: **** 9711) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: **** 9720) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: **** 9721) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: **** 9722) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: **** 9723) tl->tl_offset 384
Dec  8 10:32:42 ativ kernel: Uptime: 1h57m42s
Dec  8 10:32:42 ativ kernel: **** 9736) tl->tl_offset 16045693110842147038
Dec  8 10:32:42 ativ kernel:
Dec  8 10:32:42 ativ kernel:
Dec  8 10:32:42 ativ kernel: Fatal trap 9: general protection fault while
in kernel mode
Dec  8 10:32:42 ativ kernel: cpuid = 2; apic id = 02
Dec  8 10:32:42 ativ kernel: instruction pointer    =
0x20:0xffffffff8211b1cb
Dec  8 10:32:42 ativ kernel: stack pointer            =
0x28:0xfffffe0119525990
Dec  8 10:32:42 ativ kernel: frame pointer            =
0x28:0xfffffe01195259c0
Dec  8 10:32:42 ativ kernel: code segment        = base 0x0, limit 0xfffff,
type 0x1b
Dec  8 10:32:42 ativ kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Dec  8 10:32:42 ativ kernel: processor eflags    = interrupt enabled,
resume, IOPL = 0
Dec  8 10:32:42 ativ kernel: current process        = 0 (dbu_evict)

Probably the panic is caused by some memory already freed, the hex  value
of 16045693110842147038 is 0xdeadc0dedeadc0de.
To solve the panic I need some tips form someone more expert than me in ZFS
code.

Thanks.

-- Maurizio



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACyC=qY_dwoCeNA%2BDpN2CvOVWuX0tj6wA8zgXq2adHU0MF0FLA>