From owner-freebsd-stable@FreeBSD.ORG  Fri Sep 20 18:25:13 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 2AC448B6;
 Fri, 20 Sep 2013 18:25:13 +0000 (UTC)
 (envelope-from olivier777a7@gmail.com)
Received: from mail-la0-x22d.google.com (mail-la0-x22d.google.com
 [IPv6:2a00:1450:4010:c03::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 395A128AD;
 Fri, 20 Sep 2013 18:25:12 +0000 (UTC)
Received: by mail-la0-f45.google.com with SMTP id eh20so645907lab.32
 for <multiple recipients>; Fri, 20 Sep 2013 11:25:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=778fNebCgKXrC2nOvAUX3nG2SC5QyO48uzLZH9egt1Q=;
 b=EKyrrkboPJSDa1xvDZ+YKMhz4iomlO5S9qlO+rmBkh8nss7cdLlenKSU7kkDlH9EtD
 F4xgGphc+zmR9PTby9V2fXJ+h1Dy1mVHVWsukWNS+yx1hfV+4pwkG/JLEZS0jLYDVtaN
 qsq7I9yxgfTvd2xPLaU5C0z+s8hKCkWKuAoIyBB4ymKno+7ttgj06u/pRIRCXtJth3wr
 InlwLF3/Zccjzei2Wi2FQe+xEUR/SfKlmnaW7x4CAkOKojIHz1BeVYiyhtVS9Z+2pmrV
 yYiZiPlte6jka0RLvl7UMN3Fc4c7To5wupaPtzs360NRka0/875AsVWklD7tMPQEpIS6
 n4hQ==
MIME-Version: 1.0
X-Received: by 10.112.51.101 with SMTP id j5mr7309599lbo.17.1379701510213;
 Fri, 20 Sep 2013 11:25:10 -0700 (PDT)
Received: by 10.114.176.69 with HTTP; Fri, 20 Sep 2013 11:25:10 -0700 (PDT)
In-Reply-To: <51E944B0.5080409@gmail.com>
References: <CALC5+1OCavSqJDMUysEuF=zCdEg646pH-i=p_1bK+yiVbY=xWQ@mail.gmail.com>
 <51E944B0.5080409@gmail.com>
Date: Fri, 20 Sep 2013 11:25:10 -0700
Message-ID: <CALC5+1MmfeyuMBxQBrzc15oQKm+Egi+WAKwTQ3epYG1heEfiVw@mail.gmail.com>
Subject: Re: 9.2PRERELEASE ZFS panic in lzjb_compress
From: olivier <olivier777a7@gmail.com>
To: Volodymyr Kostyrko <c.kworr@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: "freebsd-stable@freebsd.org" <stable@freebsd.org>, zfs-devel@freebsd.org
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Sep 2013 18:25:13 -0000

Got another, very similar panic again on recent 9-STABLE (r255602); I
assume the latest 9.2 release candidate is affected too. Anybody have any
idea of what could be causing this, and of a workaround other than turning
compression off?
Unlike the last panic I reported, this one did not occur during a zfs
send/receive operation. There were just a number of processes potentially
writing to disk at the same time.
All hardware is healthy as far as I can tell (memory is ECC and no errors
in logs; zpool status and smartctl show no problems).

Fatal trap 12: page fault while in kernel mode


cpuid = 4; apic id = 24
cpuid = 51; apic id = 83
fault virtual address = 0xffffff8700a9cc65
fault virtual address = 0xffffff8700ab0ea9
fault code = supervisor read data, page not present

instruction pointer = 0x20:0xffffffff8195ff47
fault code = supervisor read data, page not present
stack pointer        = 0x28:0xffffffcf951390a0
Fatal trap 12: page fault while in kernel mode
frame pointer        = 0x28:0xffffffcf951398f0
Fatal trap 12: page fault while in kernel mode
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
instruction pointer = 0x20:0xffffffff8195ffa4
stack pointer        = 0x28:0xffffffcf951250a0
processor eflags = frame pointer        = 0x28:0xffffffcf951258f0
interrupt enabled, code segment = base 0x0, limit 0xfffff, type 0x1b

resume, IOPL = 0
cpuid = 28; apic id = 4c
Fatal trap 12: page fault while in kernel mode
= DPL 0, pres 1, long 1, def32 0, gran 1
current process = 0 (zio_write_issue_hig)
processor eflags = fault virtual address = 0xffffff8700aa22ac
interrupt enabled, fault code = supervisor read data, page not present
resume, IOPL = 0
trap number = 12
instruction pointer = 0x20:0xffffffff8195ffa4
current process = 0 (zio_write_issue_hig)
panic: page fault
cpuid = 4
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame
0xffffffcf95138b30
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffcf95138bf0
panic() at panic+0x1ce/frame 0xffffffcf95138cf0
trap_fatal() at trap_fatal+0x290/frame 0xffffffcf95138d50
trap_pfault() at trap_pfault+0x211/frame 0xffffffcf95138de0
trap() at trap+0x344/frame 0xffffffcf95138fe0
calltrap() at calltrap+0x8/frame 0xffffffcf95138fe0
--- trap 0xc, rip = 0xffffffff8195ff47, rsp = 0xffffffcf951390a0, rbp =
0xffffffcf951398f0 ---
lzjb_compress() at lzjb_compress+0xa7/frame 0xffffffcf951398f0
zio_compress_data() at zio_compress_data+0x92/frame 0xffffffcf95139920
zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffffffcf95139970
zio_execute() at zio_execute+0xc3/frame 0xffffffcf951399b0
taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xffffffcf95139a00
taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame
0xffffffcf95139a20
fork_exit() at fork_exit+0x11f/frame 0xffffffcf95139a70
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffcf95139a70
--- trap 0, rip = 0, rsp = 0xffffffcf95139b30, rbp = 0 ---


0x51f47 is in lzjb_compress
(/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/lzjb.c:74).
69 }
70 if (src > (uchar_t *)s_start + s_len - MATCH_MAX) {
71 *dst++ = *src++;
72 continue;
73 }
74 hash = (src[0] << 16) + (src[1] << 8) + src[2];
75 hash += hash >> 9;
76 hash += hash >> 5;
77 hp = &lempel[hash & (LEMPEL_SIZE - 1)];
78 offset = (intptr_t)(src - *hp) & OFFSET_MASK;

dmesg output is at http://pastebin.com/U34fwJ5f
kernel config is at http://pastebin.com/c9HKfcsz
I can provide more information if useful.
Thanks


On Fri, Jul 19, 2013 at 6:52 AM, Volodymyr Kostyrko <c.kworr@gmail.com>wrote:

> 19.07.2013 07:04, olivier wrote:
>
>> Hi,
>> Running 9.2-PRERELEASE #19 r253313 I got the following panic
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 22; apic id = 46
>> fault virtual address   = 0xffffff827ebca30c
>> fault code              = supervisor read data, page not present
>> instruction pointer     = 0x20:0xffffffff81983055
>> stack pointer           = 0x28:0xffffffcf75bd60a0
>> frame pointer           = 0x28:0xffffffcf75bd68f0
>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>                          = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags        = interrupt enabled, resume, IOPL = 0
>> current process         = 0 (zio_write_issue_hig)
>> trap number             = 12
>> panic: page fault
>> cpuid = 22
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/**frame
>> 0xffffffcf75bd5b30
>> kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffcf75bd5bf0
>> panic() at panic+0x1ce/frame 0xffffffcf75bd5cf0
>> trap_fatal() at trap_fatal+0x290/frame 0xffffffcf75bd5d50
>> trap_pfault() at trap_pfault+0x211/frame 0xffffffcf75bd5de0
>> trap() at trap+0x344/frame 0xffffffcf75bd5fe0
>> calltrap() at calltrap+0x8/frame 0xffffffcf75bd5fe0
>> --- trap 0xc, rip = 0xffffffff81983055, rsp = 0xffffffcf75bd60a0, rbp =
>> 0xffffffcf75bd68f0 ---
>> lzjb_compress() at lzjb_compress+0x185/frame 0xffffffcf75bd68f0
>> zio_compress_data() at zio_compress_data+0x92/frame 0xffffffcf75bd6920
>> zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffffffcf75bd6970
>> zio_execute() at zio_execute+0xc3/frame 0xffffffcf75bd69b0
>> taskqueue_run_locked() at taskqueue_run_locked+0x74/**frame
>> 0xffffffcf75bd6a00
>> taskqueue_thread_loop() at taskqueue_thread_loop+0x46/**frame
>> 0xffffffcf75bd6a20
>> fork_exit() at fork_exit+0x11f/frame 0xffffffcf75bd6a70
>> fork_trampoline() at fork_trampoline+0xe/frame 0xffffffcf75bd6a70
>> --- trap 0, rip = 0, rsp = 0xffffffcf75bd6b30, rbp = 0 ---
>>
>> lzjb_compress+0x185 corresponds to line 85 in
>> 80 cpy = src - offset;
>> 81 if (cpy >= (uchar_t *)s_start && cpy != src &&
>> 82    src[0] == cpy[0] && src[1] == cpy[1] && src[2] == cpy[2]) {
>> 83 *copymap |= copymask;
>> 84 for (mlen = MATCH_MIN; mlen < MATCH_MAX; mlen++)
>> 85 if (src[mlen] != cpy[mlen])
>> 86 break;
>> 87 *dst++ = ((mlen - MATCH_MIN) << (NBBY - MATCH_BITS)) |
>> 88    (offset >> NBBY);
>> 89 *dst++ = (uchar_t)offset;
>>
>> I think it's the first time I've seen this panic. It happened while doing
>> a
>> send/receive. I have two pools with lzjb compression; I don't know which
>> of
>> these pools caused the problem, but one of them was the source of the
>> send/receive.
>>
>> I only have a textdump but I'm happy to try to provide more information
>> that could help anyone look into this.
>> Thanks
>> Olivier
>>
>
> Oh, I can add to this one. I have a full core dump of the same problem
> caused by copying large set of files from lzjb compressed pool to lz4
> compressed pool. vfs.zfs.recover was set.
>
> #1  0xffffffff8039d954 in kern_reboot (howto=260)
>     at /usr/src/sys/kern/kern_**shutdown.c:449
> #2  0xffffffff8039ddce in panic (fmt=<value optimized out>)
>     at /usr/src/sys/kern/kern_**shutdown.c:637
> #3  0xffffffff80620a6a in trap_fatal (frame=<value optimized out>,
>     eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.**c:879
> #4  0xffffffff80620d25 in trap_pfault (frame=0x0, usermode=0)
>     at /usr/src/sys/amd64/amd64/trap.**c:700
> #5  0xffffffff806204f6 in trap (frame=0xffffff821ca43600)
>     at /usr/src/sys/amd64/amd64/trap.**c:463
> #6  0xffffffff8060a032 in calltrap ()
>     at /usr/src/sys/amd64/amd64/**exception.S:232
> #7  0xffffffff805a9367 in vm_page_alloc (object=0xffffffff80a34030,
>     pindex=16633, req=97) at /usr/src/sys/vm/vm_page.c:1445
> #8  0xffffffff8059c42e in kmem_back (map=0xfffffe00010000e8,
>     addr=18446743524021862400, size=16384, flags=<value optimized out>)
>     at /usr/src/sys/vm/vm_kern.c:362
> #9  0xffffffff8059c2ac in kmem_malloc (map=0xfffffe00010000e8, size=16384,
>     flags=257) at /usr/src/sys/vm/vm_kern.c:313
> #10 0xffffffff80595104 in uma_large_malloc (size=<value optimized out>,
>     wait=257) at /usr/src/sys/vm/uma_core.c:994
> #11 0xffffffff80386b80 in malloc (size=16384, mtp=0xffffffff80ea7c40,
> flags=0)
>     at /usr/src/sys/kern/kern_malloc.**c:492
> #12 0xffffffff80c9e13c in lz4_compress (s_start=0xffffff80d0b19000,
>     d_start=0xffffff8159445000, s_len=131072, d_len=114688, n=-2)
>     at /usr/src/sys/modules/zfs/../..**/cddl/contrib/opensolaris/uts/**
> common/fs/zfs/lz4.c:843
> #13 0xffffffff80cdde25 in zio_compress_data (c=<value optimized out>,
>     src=<value optimized out>, dst=0xffffff8159445000, s_len=131072)
>     at /usr/src/sys/modules/zfs/../..**/cddl/contrib/opensolaris/uts/**
> common/fs/zfs/zio_compress.c:**109
> #14 0xffffffff80cda012 in zio_write_bp_init (zio=0xfffffe0143a12000)
>     at /usr/src/sys/modules/zfs/../..**/cddl/contrib/opensolaris/uts/**
> common/fs/zfs/zio.c:1107
> #15 0xffffffff80cd8ec6 in zio_execute (zio=0xfffffe0143a12000)
>     at /usr/src/sys/modules/zfs/../..**/cddl/contrib/opensolaris/uts/**
> common/fs/zfs/zio.c:1305
> #16 0xffffffff803e25e6 in taskqueue_run_locked (queue=0xfffffe00060ca300)
>     at /usr/src/sys/kern/subr_**taskqueue.c:312
> #17 0xffffffff803e2e38 in taskqueue_thread_loop (arg=<value optimized out>)
>     at /usr/src/sys/kern/subr_**taskqueue.c:501
> #18 0xffffffff8036f40a in fork_exit (
>     callout=0xffffffff803e2da0 <taskqueue_thread_loop>,
>     arg=0xfffffe00060cc3d0, frame=0xffffff821ca43a80)
>     at /usr/src/sys/kern/kern_fork.c:**988
> #19 0xffffffff8060a56e in fork_trampoline ()
>     at /usr/src/sys/amd64/amd64/**exception.S:606
>
> I have a full crash dump in case someone wants to look at it.
>
> --
> Sphinx of black quartz, judge my vow.
>