From owner-freebsd-fs@FreeBSD.ORG Sun Nov 20 16:20:31 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84DD4106564A for ; Sun, 20 Nov 2011 16:20:31 +0000 (UTC) (envelope-from rob.vanhooren@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 414C38FC12 for ; Sun, 20 Nov 2011 16:20:30 +0000 (UTC) Received: by yenl11 with SMTP id l11so5710162yen.13 for ; Sun, 20 Nov 2011 08:20:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:content-type:disposition-notification-to:subject:date :message-id:to:mime-version:x-mailer; bh=YvNXZKhkuaYHkUjAqv8PHl/l3jd3rJPpDEuxwifHxls=; b=eBlOEgY4Lh9PAH9X8c1xs97E1VVTyi9d2BelaekboZMqEVKAPB1OOsEpiaMuc/D4oY FEKMD0AyLnw/C1pJ+AZ3cCwh1w3pw+Cb9n8btR20Ihh7SQEh1Sksrq2dr2S+jemb9Zcm XzLR7Z6QsAmzScWqqQTHSyKOfgT38xPEf01A4= Received: by 10.236.129.244 with SMTP id h80mr14904377yhi.130.1321804529924; Sun, 20 Nov 2011 07:55:29 -0800 (PST) Received: from [192.168.200.137] (24-246-91-174.cable.teksavvy.com. [24.246.91.174]) by mx.google.com with ESMTPS id m29sm9851183yhi.20.2011.11.20.07.55.28 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 20 Nov 2011 07:55:29 -0800 (PST) From: Rob VanHooren Date: Sun, 20 Nov 2011 10:55:27 -0500 Message-Id: <274A19E2-6A04-4E74-A301-5DDFECA1EDEE@gmail.com> To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1251.1) X-Mailer: Apple Mail (2.1251.1) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: [zfs][panic] zio_free_issue at zfs mount of child dataset (ddt_phys_decref?) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Nov 2011 16:20:31 -0000 Hello. Looking for some input on solving a repeating panic at zfs-v28 mount of = one of a raidz3 pool's child dataset. This is amd64 8.2-STABLE running a freshly csup'd kernel build. System details (obfuscated *.confs can be shared if required): Xeon E5620, 24GB ECC. 2x hpt rr2720 sas2 controller 2x lsi 9211-8i sas2 controller (using lsi's driver, not mps) 1x hpt rr640 sata3 controller HDD are HDS 5K3000 (raidz3) SSD are OCZ Velocity3 (mirrored ZIL,L2ARC) This pool is built on top of individual GELIs (so any *solaris pool = hackery won't be of use) Appears to import fine, zpool status -v is clean, zdb (and zdb -cv) = don't spout complaints ... Mounting the parent works OK: bsd# zfs mount thePOOL Mounting some children works OK: bsd# zfs mount thePOOL/dataset-foo bsd# zfs mount thePOOL/dataset-bar bsd# zfs mount thePOOL/dataset-baz However while attempting to mount one in particular ... kaboom! bsd# zfs mount thePOOL/dataset-xyzzy (order doesn't seem to matter, xyzzy is always the problem child...) System immediately (and repeatably) panics. Transcribed from camera shot: Fatal trap 12: page fault while in kernel mode cpuid =3D 2; apic id =3D 02 fault virtual address =3D 0x30 fault code =3D supervisor write data, page not present instruction pointer =3D 0x20:0xffffffff80dc48b1 stack pointer =3D 0x28:0xffffff89190d8b00 frame pointer =3D 0x28:0xffffff89190d8b30 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 0 (zio_free_issue_1) Stopped at ddt_phys_decref+0x1: subq $0x1,0x30(%rdi) db> where Tracing pid 0 tid 100356 td 0xffffff001b8a3000 ddt_phys_decref() at zio_execute+0xc3 taskqueue_run_locked() at taskqueue_run_locked+0x93 taskqueue_thread_loop() at taskqueue_thread_loop+0x3f for_exit() at fork_exit+0x135 fork_trampoline() at for_trampoline+0xc an alltrace snippet from db: Tracing command zfs pid 117 tid 100296 td 0xffffff001b28a460 sched_switch() at ... {truncated} mi_switch() at ... sleepq_switch() at ... sleepq_wait() at ... _cv_unit() ... zio_wait() ... arc_read_nolock() ... zil_read_log_data() ... zil_replay_log_record() ... zil_parse() ... zil_replay() ... zfsvfs_setup() ... zfs_mount() ... vfs_domount() ... nmount() ... amd_64_syscall() ... Xfast_syscall() ... --- syscall (378, FreeBSD ELF64, nmount) ... Haven't found any logged PRs that might match, and some hours of = google-fu haven't led to enlightenment :-( Would appreciate your thoughts on this a) prior to opening a new PR, and b) how to approach recovery (~4TB of data on this dataset, about = 10% of the pool) TIA for your effort, R.