From owner-freebsd-stable@FreeBSD.ORG Fri Sep 20 18:25:13 2013 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 2AC448B6; Fri, 20 Sep 2013 18:25:13 +0000 (UTC) (envelope-from olivier777a7@gmail.com) Received: from mail-la0-x22d.google.com (mail-la0-x22d.google.com [IPv6:2a00:1450:4010:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 395A128AD; Fri, 20 Sep 2013 18:25:12 +0000 (UTC) Received: by mail-la0-f45.google.com with SMTP id eh20so645907lab.32 for ; Fri, 20 Sep 2013 11:25:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=778fNebCgKXrC2nOvAUX3nG2SC5QyO48uzLZH9egt1Q=; b=EKyrrkboPJSDa1xvDZ+YKMhz4iomlO5S9qlO+rmBkh8nss7cdLlenKSU7kkDlH9EtD F4xgGphc+zmR9PTby9V2fXJ+h1Dy1mVHVWsukWNS+yx1hfV+4pwkG/JLEZS0jLYDVtaN qsq7I9yxgfTvd2xPLaU5C0z+s8hKCkWKuAoIyBB4ymKno+7ttgj06u/pRIRCXtJth3wr InlwLF3/Zccjzei2Wi2FQe+xEUR/SfKlmnaW7x4CAkOKojIHz1BeVYiyhtVS9Z+2pmrV yYiZiPlte6jka0RLvl7UMN3Fc4c7To5wupaPtzs360NRka0/875AsVWklD7tMPQEpIS6 n4hQ== MIME-Version: 1.0 X-Received: by 10.112.51.101 with SMTP id j5mr7309599lbo.17.1379701510213; Fri, 20 Sep 2013 11:25:10 -0700 (PDT) Received: by 10.114.176.69 with HTTP; Fri, 20 Sep 2013 11:25:10 -0700 (PDT) In-Reply-To: <51E944B0.5080409@gmail.com> References: <51E944B0.5080409@gmail.com> Date: Fri, 20 Sep 2013 11:25:10 -0700 Message-ID: Subject: Re: 9.2PRERELEASE ZFS panic in lzjb_compress From: olivier To: Volodymyr Kostyrko Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-stable@freebsd.org" , zfs-devel@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Sep 2013 18:25:13 -0000 Got another, very similar panic again on recent 9-STABLE (r255602); I assume the latest 9.2 release candidate is affected too. Anybody have any idea of what could be causing this, and of a workaround other than turning compression off? Unlike the last panic I reported, this one did not occur during a zfs send/receive operation. There were just a number of processes potentially writing to disk at the same time. All hardware is healthy as far as I can tell (memory is ECC and no errors in logs; zpool status and smartctl show no problems). Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 24 cpuid = 51; apic id = 83 fault virtual address = 0xffffff8700a9cc65 fault virtual address = 0xffffff8700ab0ea9 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8195ff47 fault code = supervisor read data, page not present stack pointer = 0x28:0xffffffcf951390a0 Fatal trap 12: page fault while in kernel mode frame pointer = 0x28:0xffffffcf951398f0 Fatal trap 12: page fault while in kernel mode code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 instruction pointer = 0x20:0xffffffff8195ffa4 stack pointer = 0x28:0xffffffcf951250a0 processor eflags = frame pointer = 0x28:0xffffffcf951258f0 interrupt enabled, code segment = base 0x0, limit 0xfffff, type 0x1b resume, IOPL = 0 cpuid = 28; apic id = 4c Fatal trap 12: page fault while in kernel mode = DPL 0, pres 1, long 1, def32 0, gran 1 current process = 0 (zio_write_issue_hig) processor eflags = fault virtual address = 0xffffff8700aa22ac interrupt enabled, fault code = supervisor read data, page not present resume, IOPL = 0 trap number = 12 instruction pointer = 0x20:0xffffffff8195ffa4 current process = 0 (zio_write_issue_hig) panic: page fault cpuid = 4 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffffcf95138b30 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffcf95138bf0 panic() at panic+0x1ce/frame 0xffffffcf95138cf0 trap_fatal() at trap_fatal+0x290/frame 0xffffffcf95138d50 trap_pfault() at trap_pfault+0x211/frame 0xffffffcf95138de0 trap() at trap+0x344/frame 0xffffffcf95138fe0 calltrap() at calltrap+0x8/frame 0xffffffcf95138fe0 --- trap 0xc, rip = 0xffffffff8195ff47, rsp = 0xffffffcf951390a0, rbp = 0xffffffcf951398f0 --- lzjb_compress() at lzjb_compress+0xa7/frame 0xffffffcf951398f0 zio_compress_data() at zio_compress_data+0x92/frame 0xffffffcf95139920 zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffffffcf95139970 zio_execute() at zio_execute+0xc3/frame 0xffffffcf951399b0 taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xffffffcf95139a00 taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame 0xffffffcf95139a20 fork_exit() at fork_exit+0x11f/frame 0xffffffcf95139a70 fork_trampoline() at fork_trampoline+0xe/frame 0xffffffcf95139a70 --- trap 0, rip = 0, rsp = 0xffffffcf95139b30, rbp = 0 --- 0x51f47 is in lzjb_compress (/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/lzjb.c:74). 69 } 70 if (src > (uchar_t *)s_start + s_len - MATCH_MAX) { 71 *dst++ = *src++; 72 continue; 73 } 74 hash = (src[0] << 16) + (src[1] << 8) + src[2]; 75 hash += hash >> 9; 76 hash += hash >> 5; 77 hp = &lempel[hash & (LEMPEL_SIZE - 1)]; 78 offset = (intptr_t)(src - *hp) & OFFSET_MASK; dmesg output is at http://pastebin.com/U34fwJ5f kernel config is at http://pastebin.com/c9HKfcsz I can provide more information if useful. Thanks On Fri, Jul 19, 2013 at 6:52 AM, Volodymyr Kostyrko wrote: > 19.07.2013 07:04, olivier wrote: > >> Hi, >> Running 9.2-PRERELEASE #19 r253313 I got the following panic >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 22; apic id = 46 >> fault virtual address = 0xffffff827ebca30c >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0xffffffff81983055 >> stack pointer = 0x28:0xffffffcf75bd60a0 >> frame pointer = 0x28:0xffffffcf75bd68f0 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 0 (zio_write_issue_hig) >> trap number = 12 >> panic: page fault >> cpuid = 22 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/**frame >> 0xffffffcf75bd5b30 >> kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffcf75bd5bf0 >> panic() at panic+0x1ce/frame 0xffffffcf75bd5cf0 >> trap_fatal() at trap_fatal+0x290/frame 0xffffffcf75bd5d50 >> trap_pfault() at trap_pfault+0x211/frame 0xffffffcf75bd5de0 >> trap() at trap+0x344/frame 0xffffffcf75bd5fe0 >> calltrap() at calltrap+0x8/frame 0xffffffcf75bd5fe0 >> --- trap 0xc, rip = 0xffffffff81983055, rsp = 0xffffffcf75bd60a0, rbp = >> 0xffffffcf75bd68f0 --- >> lzjb_compress() at lzjb_compress+0x185/frame 0xffffffcf75bd68f0 >> zio_compress_data() at zio_compress_data+0x92/frame 0xffffffcf75bd6920 >> zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffffffcf75bd6970 >> zio_execute() at zio_execute+0xc3/frame 0xffffffcf75bd69b0 >> taskqueue_run_locked() at taskqueue_run_locked+0x74/**frame >> 0xffffffcf75bd6a00 >> taskqueue_thread_loop() at taskqueue_thread_loop+0x46/**frame >> 0xffffffcf75bd6a20 >> fork_exit() at fork_exit+0x11f/frame 0xffffffcf75bd6a70 >> fork_trampoline() at fork_trampoline+0xe/frame 0xffffffcf75bd6a70 >> --- trap 0, rip = 0, rsp = 0xffffffcf75bd6b30, rbp = 0 --- >> >> lzjb_compress+0x185 corresponds to line 85 in >> 80 cpy = src - offset; >> 81 if (cpy >= (uchar_t *)s_start && cpy != src && >> 82 src[0] == cpy[0] && src[1] == cpy[1] && src[2] == cpy[2]) { >> 83 *copymap |= copymask; >> 84 for (mlen = MATCH_MIN; mlen < MATCH_MAX; mlen++) >> 85 if (src[mlen] != cpy[mlen]) >> 86 break; >> 87 *dst++ = ((mlen - MATCH_MIN) << (NBBY - MATCH_BITS)) | >> 88 (offset >> NBBY); >> 89 *dst++ = (uchar_t)offset; >> >> I think it's the first time I've seen this panic. It happened while doing >> a >> send/receive. I have two pools with lzjb compression; I don't know which >> of >> these pools caused the problem, but one of them was the source of the >> send/receive. >> >> I only have a textdump but I'm happy to try to provide more information >> that could help anyone look into this. >> Thanks >> Olivier >> > > Oh, I can add to this one. I have a full core dump of the same problem > caused by copying large set of files from lzjb compressed pool to lz4 > compressed pool. vfs.zfs.recover was set. > > #1 0xffffffff8039d954 in kern_reboot (howto=260) > at /usr/src/sys/kern/kern_**shutdown.c:449 > #2 0xffffffff8039ddce in panic (fmt=) > at /usr/src/sys/kern/kern_**shutdown.c:637 > #3 0xffffffff80620a6a in trap_fatal (frame=, > eva=) at /usr/src/sys/amd64/amd64/trap.**c:879 > #4 0xffffffff80620d25 in trap_pfault (frame=0x0, usermode=0) > at /usr/src/sys/amd64/amd64/trap.**c:700 > #5 0xffffffff806204f6 in trap (frame=0xffffff821ca43600) > at /usr/src/sys/amd64/amd64/trap.**c:463 > #6 0xffffffff8060a032 in calltrap () > at /usr/src/sys/amd64/amd64/**exception.S:232 > #7 0xffffffff805a9367 in vm_page_alloc (object=0xffffffff80a34030, > pindex=16633, req=97) at /usr/src/sys/vm/vm_page.c:1445 > #8 0xffffffff8059c42e in kmem_back (map=0xfffffe00010000e8, > addr=18446743524021862400, size=16384, flags=) > at /usr/src/sys/vm/vm_kern.c:362 > #9 0xffffffff8059c2ac in kmem_malloc (map=0xfffffe00010000e8, size=16384, > flags=257) at /usr/src/sys/vm/vm_kern.c:313 > #10 0xffffffff80595104 in uma_large_malloc (size=, > wait=257) at /usr/src/sys/vm/uma_core.c:994 > #11 0xffffffff80386b80 in malloc (size=16384, mtp=0xffffffff80ea7c40, > flags=0) > at /usr/src/sys/kern/kern_malloc.**c:492 > #12 0xffffffff80c9e13c in lz4_compress (s_start=0xffffff80d0b19000, > d_start=0xffffff8159445000, s_len=131072, d_len=114688, n=-2) > at /usr/src/sys/modules/zfs/../..**/cddl/contrib/opensolaris/uts/** > common/fs/zfs/lz4.c:843 > #13 0xffffffff80cdde25 in zio_compress_data (c=, > src=, dst=0xffffff8159445000, s_len=131072) > at /usr/src/sys/modules/zfs/../..**/cddl/contrib/opensolaris/uts/** > common/fs/zfs/zio_compress.c:**109 > #14 0xffffffff80cda012 in zio_write_bp_init (zio=0xfffffe0143a12000) > at /usr/src/sys/modules/zfs/../..**/cddl/contrib/opensolaris/uts/** > common/fs/zfs/zio.c:1107 > #15 0xffffffff80cd8ec6 in zio_execute (zio=0xfffffe0143a12000) > at /usr/src/sys/modules/zfs/../..**/cddl/contrib/opensolaris/uts/** > common/fs/zfs/zio.c:1305 > #16 0xffffffff803e25e6 in taskqueue_run_locked (queue=0xfffffe00060ca300) > at /usr/src/sys/kern/subr_**taskqueue.c:312 > #17 0xffffffff803e2e38 in taskqueue_thread_loop (arg=) > at /usr/src/sys/kern/subr_**taskqueue.c:501 > #18 0xffffffff8036f40a in fork_exit ( > callout=0xffffffff803e2da0 , > arg=0xfffffe00060cc3d0, frame=0xffffff821ca43a80) > at /usr/src/sys/kern/kern_fork.c:**988 > #19 0xffffffff8060a56e in fork_trampoline () > at /usr/src/sys/amd64/amd64/**exception.S:606 > > I have a full crash dump in case someone wants to look at it. > > -- > Sphinx of black quartz, judge my vow. >