From owner-freebsd-fs@FreeBSD.ORG Mon Mar 31 11:07:00 2008 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 222571065670 for ; Mon, 31 Mar 2008 11:07:00 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 1365E8FC14 for ; Mon, 31 Mar 2008 11:07:00 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m2VB6xm6038892 for ; Mon, 31 Mar 2008 11:06:59 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m2VB6xAq038888 for freebsd-fs@FreeBSD.org; Mon, 31 Mar 2008 11:06:59 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 31 Mar 2008 11:06:59 GMT Message-Id: <200803311106.m2VB6xAq038888@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Mar 2008 11:07:00 -0000 Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha 4 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o bin/118249 fs mv(1): moving a directory changes its mtime 6 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 31 12:06:59 2008 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1DCED106574C for ; Mon, 31 Mar 2008 12:06:57 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id C9C898FC17 for ; Mon, 31 Mar 2008 12:06:56 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from [172.16.129.140] (fw.axelero.hu [195.228.243.120]) by people.fsn.hu (Postfix) with ESMTP id 32DC6AF422 for ; Mon, 31 Mar 2008 13:51:13 +0200 (CEST) Message-ID: <47F0D02B.8060504@fsn.hu> Date: Mon, 31 Mar 2008 13:51:07 +0200 From: Attila Nagy User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: ZFS hangs very often X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Mar 2008 12:06:59 -0000 Hello, On my desktop machine I use a ZFS pool for everything but the swap and the root fs (so /usr, /tmp, and everything else is on ZFS, swap and / is on gmirror of two-two partitions). The first setup was a FreeBSD/i386 7-STABLE, and the pool consisted of two partitions from two SATA disks, which were encrypted with GELI individually. After using the system for some weeks (without any higher number of IO activity, just working on the machine), the first hang came: I couldn't move the mouse under X, but remote sessions were alive and also the clock app still counted the time fine. I couldn't log into the machine via ssh, the port was open, but I haven't got the banner. I've done a portupgrade at that time. After this (and several other) hangs, I decided to remove GELI from the equation without success. Then one partition (disk) instead of two. And now, I am running amd64 instead of i386 and the problem still persists. I've attached my notebook to the machine and here is what I have during the hang (currently, I am in the process of upgrading some ports and now a configure tries to run, but the machine has stopped): KDB: enter: manual escape to debugger [thread pid 23 tid 100022 ] Stopped at kdb_enter+0x31: leave db> bt Tracing pid 23 tid 100022 td 0xffffff000127c350 kdb_enter() at kdb_enter+0x31 scgetc() at scgetc+0x461 sckbdevent() at sckbdevent+0xa4 kbdmux_intr() at kbdmux_intr+0x43 kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20 taskqueue_run() at taskqueue_run+0x9f ithread_loop() at ithread_loop+0x180 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffb4f04d30, rbp = 0 --- db> ps pid ppid pgrp uid state wmesg wchan cmd 77873 77871 76757 0 S+ piperd 0xffffff0010036ba0 as 77872 77871 76757 0 S+ zfs:(&zi 0xffffff0020ba0298 cc1plus 77871 77870 76757 0 S+ wait 0xffffff0001446000 c++ 77870 77321 76757 0 S+ wait 0xffffff000eba7468 sh 77321 76882 76757 0 S+ wait 0xffffff00396448d0 sh 76882 76881 76757 0 S+ wait 0xffffff000eba5468 sh 76881 76757 76757 0 S+ wait 0xffffff00014b28d0 sh 76757 76755 76757 0 Ss+ wait 0xffffff001b369000 make 76755 62725 62725 0 S+ select 0xffffffff80a89d50 script 62725 817 62725 0 S+ wait 0xffffff0001f43000 initial thread 86757 86674 86757 1001 SL+ pfault 0xffffffff80a9a79c ssh 86674 86672 86674 1001 Ss+ wait 0xffffff0001f428d0 bash 86672 86670 86670 1001 S select 0xffffffff80a89d50 sshd 86670 721 86670 0 Ss sbwait 0xffffff0001d2015c sshd 62788 802 62788 0 S+ ttyin 0xffffff00013bac10 csh 46310 801 46310 0 ?+ csh 817 800 817 0 S+ pause 0xffffff0001f420c0 csh 807 1 807 0 Ss+ ttyin 0xffffff000139e810 getty 806 1 806 0 Ss+ ttyin 0xffffff00013bb410 getty 805 1 805 0 Ss+ ttyin 0xffffff00013ba810 getty 804 1 804 0 Ss+ ttyin 0xffffff00013ba010 getty 803 1 803 0 Ss+ ttyin 0xffffff00013b9410 getty 802 1 802 0 Ss+ wait 0xffffff00014b08d0 login 801 1 801 0 Ss+ wait 0xffffff0001567468 login 800 1 800 0 Ss+ wait 0xffffff00014b0468 login 737 1 737 0 ?s cron 731 1 731 25 Ss pause 0xffffff00015650c0 sendmail 727 1 727 0 ?s sendmail 721 1 721 0 Ss select 0xffffffff80a89d50 sshd 688 687 687 123 ? ntpd 687 1 687 0 Ss select 0xffffffff80a89d50 ntpd 668 1 668 0 Ss select 0xffffffff80a89d50 powerd 559 1 559 0 ?s syslogd 488 1 488 0 Ss select 0xffffffff80a89d50 devd 440 1 440 0 Ss select 0xffffffff80a89d50 moused 248 1 248 0 Ss pause 0xffffff00015640c0 adjkerntz 175 0 0 0 SL zfs:(&tq 0xffffff0001583080 [zil_clean] 174 0 0 0 SL zfs:(&tq 0xffffff00015831c0 [zil_clean] 173 0 0 0 SL zfs:(&tq 0xffffff0001583300 [zil_clean] 172 0 0 0 SL zfs:(&tq 0xffffff0001583440 [zil_clean] 171 0 0 0 SL zfs:(&tq 0xffffff0001583580 [zil_clean] 170 0 0 0 SL zfs:(&tq 0xffffff00015836c0 [zil_clean] 168 0 0 0 SL zfs:(&tx 0xffffff000146c590 [txg_thread_enter] 167 0 0 0 SL zfs:(&zi 0xffffff000c717d58 [txg_thread_enter] 166 0 0 0 SL zfs:(&tx 0xffffff00014e0a40 [txg_thread_enter] 165 0 0 0 SL vgeom:io 0xffffff000145c410 [vdev:worker ad0s1d] 164 0 0 0 SL zfs:(&tq 0xffffff000158e300 [spa_zio_intr_5] 163 0 0 0 SL zfs:(&tq 0xffffff000158e300 [spa_zio_intr_5] 162 0 0 0 SL zfs:(&tq 0xffffff000158e1c0 [spa_zio_issue_5] 161 0 0 0 SL zfs:(&tq 0xffffff000158e1c0 [spa_zio_issue_5] 160 0 0 0 SL zfs:(&tq 0xffffff0001227d00 [spa_zio_intr_4] 159 0 0 0 SL zfs:(&tq 0xffffff0001227d00 [spa_zio_intr_4] 158 0 0 0 SL zfs:(&tq 0xffffff0001227bc0 [spa_zio_issue_4] 157 0 0 0 SL zfs:(&tq 0xffffff0001227bc0 [spa_zio_issue_4] 156 0 0 0 SL zfs:(&tq 0xffffff0001227a80 [spa_zio_intr_3] 155 0 0 0 SL zfs:(&tq 0xffffff0001227a80 [spa_zio_intr_3] 154 0 0 0 SL zfs:(&tq 0xffffff0001227940 [spa_zio_issue_3] 153 0 0 0 SL zfs:(&tq 0xffffff0001227940 [spa_zio_issue_3] 152 0 0 0 SL zfs:&vq- 0xffffff00015d0c88 [spa_zio_intr_2] 151 0 0 0 SL vmwait 0xffffffff80a9a79c [spa_zio_intr_2] 150 0 0 0 SL vmwait 0xffffffff80a9a79c [spa_zio_issue_2] 149 0 0 0 SL vmwait 0xffffffff80a9a79c [spa_zio_issue_2] 148 0 0 0 SL zfs:(&tq 0xffffff0001227580 [spa_zio_intr_1] 147 0 0 0 SL zfs:(&tq 0xffffff0001227580 [spa_zio_intr_1] 146 0 0 0 SL zfs:(&tq 0xffffff0001227440 [spa_zio_issue_1] 145 0 0 0 SL zfs:(&tq 0xffffff0001227440 [spa_zio_issue_1] 144 0 0 0 SL zfs:(&tq 0xffffff00012271c0 [spa_zio_intr_0] 143 0 0 0 SL zfs:(&tq 0xffffff00012271c0 [spa_zio_intr_0] 142 0 0 0 SL zfs:(&tq 0xffffff0001227300 [spa_zio_issue_0] 141 0 0 0 SL zfs:(&tq 0xffffff0001227300 [spa_zio_issue_0] 87 0 0 0 SL vmwait 0xffffffff80a9a79c [g_eli[1] mirror/swa] 86 0 0 0 SL vmwait 0xffffffff80a9a79c [g_eli[0] mirror/swa] 53 0 0 0 SL sdflush 0xffffffff80a99d88 [softdepflush] 52 0 0 0 SL vlruwt 0xffffff0001448000 [vnlru] 51 0 0 0 SL zfs:&vq- 0xffffff00015d0c88 [syncer] 50 0 0 0 SL psleep 0xffffffff80a8a55c [bufdaemon] 49 0 0 0 SL pgzero 0xffffffff80a9b804 [pagezero] 48 0 0 0 SL psleep 0xffffffff80a9ab48 [vmdaemon] 47 0 0 0 SL wswbuf0 0xffffffff80a9a004 [pagedaemon] 46 0 0 0 SL m:w1 0xffffff0001401200 [g_mirror swap] 45 0 0 0 SL m:w1 0xffffff00013c3800 [g_mirror root] 44 0 0 0 SL zfs:(&ar 0xffffffff80c746b0 [arc_reclaim_thread] 43 0 0 0 SL waiting_ 0xffffffff80a8dc88 [sctp_iterator] 42 0 0 0 WL [swi0: sio] 41 0 0 0 WL [irq1: atkbd0] 40 0 0 0 WL [irq15: ata1] 39 0 0 0 WL [irq14: ata0] 38 0 0 0 SL usbevt 0xffffff000133a420 [usb5] 37 0 0 0 SL usbevt 0xffffffff81065420 [usb4] 36 0 0 0 SL usbevt 0xffffffff81063420 [usb3] 35 0 0 0 SL usbevt 0xffffff000130c420 [usb2] 34 0 0 0 WL [irq22: ehci0] 33 0 0 0 SL usbevt 0xffffffff81061420 [usb1] 32 0 0 0 WL [irq21: pcm0 uhci1+] 31 0 0 0 SL usbtsk 0xffffffff80a71028 [usbtask-dr] 30 0 0 0 SL usbtsk 0xffffffff80a71000 [usbtask-hc] 29 0 0 0 SL usbevt 0xffffffff8105f420 [usb0] 28 0 0 0 WL [irq20: uhci0 uhci+] 27 0 0 0 SL - 0xffffff00012ef880 [em0 taskq] 26 0 0 0 WL [irq9: acpi0] 25 0 0 0 SL - 0xffffff0001294580 [kqueue taskq] 24 0 0 0 WL [swi6: task queue] 23 0 0 0 RL CPU 1 [swi6: Giant taskq] 22 0 0 0 SL - 0xffffff000122c500 [thread taskq] 21 0 0 0 WL [swi5: +] 20 0 0 0 SL - 0xffffff000122ca80 [acpi_task_2] 19 0 0 0 SL - 0xffffff000122ca80 [acpi_task_1] 18 0 0 0 SL - 0xffffff000122ca80 [acpi_task_0] 17 0 0 0 WL [swi2: cambio] 9 0 0 0 SL ccb_scan 0xffffffff80a3fda0 [xpt_thrd] 16 0 0 0 SL - 0xffffffff80a74ea8 [yarrow] 8 0 0 0 SL crypto_r 0xffffffff80d293b0 [crypto returns] 7 0 0 0 SL crypto_w 0xffffffff80d29350 [crypto] 6 0 0 0 SL zfs:(&tq 0xffffff0001227080 [system_taskq] 5 0 0 0 SL zfs:(&tq 0xffffff0001227080 [system_taskq] 4 0 0 0 SL - 0xffffffff80a71838 [g_down] 3 0 0 0 SL - 0xffffffff80a71830 [g_up] 2 0 0 0 SL - 0xffffffff80a71820 [g_event] 15 0 0 0 WL [swi1: net] 14 0 0 0 WL [swi3: vm] 13 0 0 0 LL *Giant 0xffffff00015a2be0 [swi4: clock sio] 12 0 0 0 RL CPU 0 [idle: cpu0] 11 0 0 0 RL [idle: cpu1] 1 0 1 0 SLs wait 0xffffff000112f8d0 [init] 10 0 0 0 SL audit_wo 0xffffffff80a99260 [audit] 0 0 0 0 SLs vmwait 0xffffffff80a9a79c [swapper] db> trace 77872 Tracing pid 77872 tid 100157 td 0xffffff000eb9c350 sched_switch() at sched_switch+0x1fe mi_switch() at mi_switch+0x189 sleepq_wait() at sleepq_wait+0x3b _cv_wait() at _cv_wait+0xfe zio_wait() at zio_wait+0x5f dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x1f6 dmu_buf_hold_array() at dmu_buf_hold_array+0x62 dmu_read_uio() at dmu_read_uio+0x3f zfs_freebsd_read() at zfs_freebsd_read+0x535 vn_read() at vn_read+0x1ca dofileread() at dofileread+0xa1 kern_readv() at kern_readv+0x4c read() at read+0x54 syscall() at syscall+0x254 Xfast_syscall() at Xfast_syscall+0xab --- syscall (3, FreeBSD ELF64, read), rip = 0x8bd74c, rsp = 0x7fffffffe2c8, rbp = 0 --- db> trace 51 Tracing pid 51 tid 100050 td 0xffffff00014246a0 sched_switch() at sched_switch+0x1fe mi_switch() at mi_switch+0x189 sleepq_wait() at sleepq_wait+0x3b _sx_xlock_hard() at _sx_xlock_hard+0x1ee _sx_xlock() at _sx_xlock+0x4e vdev_queue_io() at vdev_queue_io+0x74 vdev_geom_io_start() at vdev_geom_io_start+0x4a vdev_mirror_io_start() at vdev_mirror_io_start+0x1b0 zil_lwb_write_start() at zil_lwb_write_start+0x2f1 zil_commit_writer() at zil_commit_writer+0x1c4 zil_commit() at zil_commit+0xb8 zfs_sync() at zfs_sync+0x9a sync_fsync() at sync_fsync+0x1ac sched_sync() at sched_sync+0x63f fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffb502fd30, rbp = 0 --- Any ideas about this? From owner-freebsd-fs@FreeBSD.ORG Mon Mar 31 13:22:56 2008 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 771851065672 for ; Mon, 31 Mar 2008 13:22:56 +0000 (UTC) (envelope-from gary.jennejohn@freenet.de) Received: from mout4.freenet.de (mout4.freenet.de [IPv6:2001:748:100:40::2:6]) by mx1.freebsd.org (Postfix) with ESMTP id 7BB608FC25 for ; Mon, 31 Mar 2008 13:22:55 +0000 (UTC) (envelope-from gary.jennejohn@freenet.de) Received: from [195.4.92.14] (helo=4.mx.freenet.de) by mout4.freenet.de with esmtpa (Exim 4.69) (envelope-from ) id 1JgJyP-0002Ry-GI; Mon, 31 Mar 2008 15:22:54 +0200 Received: from x1b6f.x.pppool.de ([89.59.27.111]:35965 helo=peedub.jennejohn.org) by 4.mx.freenet.de with esmtpa (ID gary.jennejohn@freenet.de) (port 25) (Exim 4.69 #12) id 1JgJyO-0007Cv-Sn; Mon, 31 Mar 2008 15:22:53 +0200 Date: Mon, 31 Mar 2008 15:22:51 +0200 From: Gary Jennejohn To: Attila Nagy Message-ID: <20080331152251.62526181@peedub.jennejohn.org> In-Reply-To: <47F0D02B.8060504@fsn.hu> References: <47F0D02B.8060504@fsn.hu> X-Mailer: Claws Mail 3.3.1 (GTK+ 2.10.14; amd64-portbld-freebsd8.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: ZFS hangs very often X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: gary.jennejohn@freenet.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Mar 2008 13:22:56 -0000 On Mon, 31 Mar 2008 13:51:07 +0200 Attila Nagy wrote: > Hello, > > On my desktop machine I use a ZFS pool for everything but the swap and > the root fs (so /usr, /tmp, and everything else is on ZFS, swap and / is > on gmirror of two-two partitions). > > The first setup was a FreeBSD/i386 7-STABLE, and the pool consisted of > two partitions from two SATA disks, which were encrypted with GELI > individually. > > After using the system for some weeks (without any higher number of IO > activity, just working on the machine), the first hang came: I couldn't > move the mouse under X, but remote sessions were alive and also the > clock app still counted the time fine. I couldn't log into the machine > via ssh, the port was open, but I haven't got the banner. > I've done a portupgrade at that time. > > After this (and several other) hangs, I decided to remove GELI from the > equation without success. Then one partition (disk) instead of two. And > now, I am running amd64 instead of i386 and the problem still persists. > > I've attached my notebook to the machine and here is what I have during > the hang (currently, I am in the process of upgrading some ports and now > a configure tries to run, but the machine has stopped): > KDB: enter: manual escape to debugger > [thread pid 23 tid 100022 ] > Stopped at kdb_enter+0x31: leave > db> bt > Tracing pid 23 tid 100022 td 0xffffff000127c350 > kdb_enter() at kdb_enter+0x31 > scgetc() at scgetc+0x461 > sckbdevent() at sckbdevent+0xa4 > kbdmux_intr() at kbdmux_intr+0x43 > kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20 > taskqueue_run() at taskqueue_run+0x9f > ithread_loop() at ithread_loop+0x180 > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffffffb4f04d30, rbp = 0 --- > db> ps > pid ppid pgrp uid state wmesg wchan cmd > 77873 77871 76757 0 S+ piperd 0xffffff0010036ba0 as > 77872 77871 76757 0 S+ zfs:(&zi 0xffffff0020ba0298 cc1plus > 77871 77870 76757 0 S+ wait 0xffffff0001446000 c++ > 77870 77321 76757 0 S+ wait 0xffffff000eba7468 sh > 77321 76882 76757 0 S+ wait 0xffffff00396448d0 sh > 76882 76881 76757 0 S+ wait 0xffffff000eba5468 sh > 76881 76757 76757 0 S+ wait 0xffffff00014b28d0 sh > 76757 76755 76757 0 Ss+ wait 0xffffff001b369000 make > 76755 62725 62725 0 S+ select 0xffffffff80a89d50 script > 62725 817 62725 0 S+ wait 0xffffff0001f43000 initial thread > 86757 86674 86757 1001 SL+ pfault 0xffffffff80a9a79c ssh > 86674 86672 86674 1001 Ss+ wait 0xffffff0001f428d0 bash > 86672 86670 86670 1001 S select 0xffffffff80a89d50 sshd > 86670 721 86670 0 Ss sbwait 0xffffff0001d2015c sshd > 62788 802 62788 0 S+ ttyin 0xffffff00013bac10 csh > 46310 801 46310 0 ?+ csh > 817 800 817 0 S+ pause 0xffffff0001f420c0 csh > 807 1 807 0 Ss+ ttyin 0xffffff000139e810 getty > 806 1 806 0 Ss+ ttyin 0xffffff00013bb410 getty > 805 1 805 0 Ss+ ttyin 0xffffff00013ba810 getty > 804 1 804 0 Ss+ ttyin 0xffffff00013ba010 getty > 803 1 803 0 Ss+ ttyin 0xffffff00013b9410 getty > 802 1 802 0 Ss+ wait 0xffffff00014b08d0 login > 801 1 801 0 Ss+ wait 0xffffff0001567468 login > 800 1 800 0 Ss+ wait 0xffffff00014b0468 login > 737 1 737 0 ?s cron > 731 1 731 25 Ss pause 0xffffff00015650c0 sendmail > 727 1 727 0 ?s sendmail > 721 1 721 0 Ss select 0xffffffff80a89d50 sshd > 688 687 687 123 ? ntpd > 687 1 687 0 Ss select 0xffffffff80a89d50 ntpd > 668 1 668 0 Ss select 0xffffffff80a89d50 powerd > 559 1 559 0 ?s syslogd > 488 1 488 0 Ss select 0xffffffff80a89d50 devd > 440 1 440 0 Ss select 0xffffffff80a89d50 moused > 248 1 248 0 Ss pause 0xffffff00015640c0 adjkerntz > 175 0 0 0 SL zfs:(&tq 0xffffff0001583080 [zil_clean] > 174 0 0 0 SL zfs:(&tq 0xffffff00015831c0 [zil_clean] > 173 0 0 0 SL zfs:(&tq 0xffffff0001583300 [zil_clean] > 172 0 0 0 SL zfs:(&tq 0xffffff0001583440 [zil_clean] > 171 0 0 0 SL zfs:(&tq 0xffffff0001583580 [zil_clean] > 170 0 0 0 SL zfs:(&tq 0xffffff00015836c0 [zil_clean] > 168 0 0 0 SL zfs:(&tx 0xffffff000146c590 > [txg_thread_enter] > 167 0 0 0 SL zfs:(&zi 0xffffff000c717d58 > [txg_thread_enter] > 166 0 0 0 SL zfs:(&tx 0xffffff00014e0a40 > [txg_thread_enter] > 165 0 0 0 SL vgeom:io 0xffffff000145c410 > [vdev:worker ad0s1d] > 164 0 0 0 SL zfs:(&tq 0xffffff000158e300 > [spa_zio_intr_5] > 163 0 0 0 SL zfs:(&tq 0xffffff000158e300 > [spa_zio_intr_5] > 162 0 0 0 SL zfs:(&tq 0xffffff000158e1c0 > [spa_zio_issue_5] > 161 0 0 0 SL zfs:(&tq 0xffffff000158e1c0 > [spa_zio_issue_5] > 160 0 0 0 SL zfs:(&tq 0xffffff0001227d00 > [spa_zio_intr_4] > 159 0 0 0 SL zfs:(&tq 0xffffff0001227d00 > [spa_zio_intr_4] > 158 0 0 0 SL zfs:(&tq 0xffffff0001227bc0 > [spa_zio_issue_4] > 157 0 0 0 SL zfs:(&tq 0xffffff0001227bc0 > [spa_zio_issue_4] > 156 0 0 0 SL zfs:(&tq 0xffffff0001227a80 > [spa_zio_intr_3] > 155 0 0 0 SL zfs:(&tq 0xffffff0001227a80 > [spa_zio_intr_3] > 154 0 0 0 SL zfs:(&tq 0xffffff0001227940 > [spa_zio_issue_3] > 153 0 0 0 SL zfs:(&tq 0xffffff0001227940 > [spa_zio_issue_3] > 152 0 0 0 SL zfs:&vq- 0xffffff00015d0c88 > [spa_zio_intr_2] > 151 0 0 0 SL vmwait 0xffffffff80a9a79c > [spa_zio_intr_2] > 150 0 0 0 SL vmwait 0xffffffff80a9a79c > [spa_zio_issue_2] > 149 0 0 0 SL vmwait 0xffffffff80a9a79c > [spa_zio_issue_2] > 148 0 0 0 SL zfs:(&tq 0xffffff0001227580 > [spa_zio_intr_1] > 147 0 0 0 SL zfs:(&tq 0xffffff0001227580 > [spa_zio_intr_1] > 146 0 0 0 SL zfs:(&tq 0xffffff0001227440 > [spa_zio_issue_1] > 145 0 0 0 SL zfs:(&tq 0xffffff0001227440 > [spa_zio_issue_1] > 144 0 0 0 SL zfs:(&tq 0xffffff00012271c0 > [spa_zio_intr_0] > 143 0 0 0 SL zfs:(&tq 0xffffff00012271c0 > [spa_zio_intr_0] > 142 0 0 0 SL zfs:(&tq 0xffffff0001227300 > [spa_zio_issue_0] > 141 0 0 0 SL zfs:(&tq 0xffffff0001227300 > [spa_zio_issue_0] > 87 0 0 0 SL vmwait 0xffffffff80a9a79c [g_eli[1] > mirror/swa] > 86 0 0 0 SL vmwait 0xffffffff80a9a79c [g_eli[0] > mirror/swa] > 53 0 0 0 SL sdflush 0xffffffff80a99d88 [softdepflush] > 52 0 0 0 SL vlruwt 0xffffff0001448000 [vnlru] > 51 0 0 0 SL zfs:&vq- 0xffffff00015d0c88 [syncer] > 50 0 0 0 SL psleep 0xffffffff80a8a55c [bufdaemon] > 49 0 0 0 SL pgzero 0xffffffff80a9b804 [pagezero] > 48 0 0 0 SL psleep 0xffffffff80a9ab48 [vmdaemon] > 47 0 0 0 SL wswbuf0 0xffffffff80a9a004 [pagedaemon] > 46 0 0 0 SL m:w1 0xffffff0001401200 [g_mirror swap] > 45 0 0 0 SL m:w1 0xffffff00013c3800 [g_mirror root] > 44 0 0 0 SL zfs:(&ar 0xffffffff80c746b0 > [arc_reclaim_thread] > 43 0 0 0 SL waiting_ 0xffffffff80a8dc88 [sctp_iterator] > 42 0 0 0 WL [swi0: sio] > 41 0 0 0 WL [irq1: atkbd0] > 40 0 0 0 WL [irq15: ata1] > 39 0 0 0 WL [irq14: ata0] > 38 0 0 0 SL usbevt 0xffffff000133a420 [usb5] > 37 0 0 0 SL usbevt 0xffffffff81065420 [usb4] > 36 0 0 0 SL usbevt 0xffffffff81063420 [usb3] > 35 0 0 0 SL usbevt 0xffffff000130c420 [usb2] > 34 0 0 0 WL [irq22: ehci0] > 33 0 0 0 SL usbevt 0xffffffff81061420 [usb1] > 32 0 0 0 WL [irq21: > pcm0 uhci1+] > 31 0 0 0 SL usbtsk 0xffffffff80a71028 [usbtask-dr] > 30 0 0 0 SL usbtsk 0xffffffff80a71000 [usbtask-hc] > 29 0 0 0 SL usbevt 0xffffffff8105f420 [usb0] > 28 0 0 0 WL [irq20: > uhci0 uhci+] > 27 0 0 0 SL - 0xffffff00012ef880 [em0 taskq] > 26 0 0 0 WL [irq9: acpi0] > 25 0 0 0 SL - 0xffffff0001294580 [kqueue taskq] > 24 0 0 0 WL [swi6: task > queue] > 23 0 0 0 RL CPU 1 [swi6: > Giant taskq] > 22 0 0 0 SL - 0xffffff000122c500 [thread taskq] > 21 0 0 0 WL [swi5: +] > 20 0 0 0 SL - 0xffffff000122ca80 [acpi_task_2] > 19 0 0 0 SL - 0xffffff000122ca80 [acpi_task_1] > 18 0 0 0 SL - 0xffffff000122ca80 [acpi_task_0] > 17 0 0 0 WL [swi2: cambio] > 9 0 0 0 SL ccb_scan 0xffffffff80a3fda0 [xpt_thrd] > 16 0 0 0 SL - 0xffffffff80a74ea8 [yarrow] > 8 0 0 0 SL crypto_r 0xffffffff80d293b0 [crypto > returns] > 7 0 0 0 SL crypto_w 0xffffffff80d29350 [crypto] > 6 0 0 0 SL zfs:(&tq 0xffffff0001227080 [system_taskq] > 5 0 0 0 SL zfs:(&tq 0xffffff0001227080 [system_taskq] > 4 0 0 0 SL - 0xffffffff80a71838 [g_down] > 3 0 0 0 SL - 0xffffffff80a71830 [g_up] > 2 0 0 0 SL - 0xffffffff80a71820 [g_event] > 15 0 0 0 WL [swi1: net] > 14 0 0 0 WL [swi3: vm] > 13 0 0 0 LL *Giant 0xffffff00015a2be0 [swi4: > clock sio] > 12 0 0 0 RL CPU 0 [idle: cpu0] > 11 0 0 0 RL [idle: cpu1] > 1 0 1 0 SLs wait 0xffffff000112f8d0 [init] > 10 0 0 0 SL audit_wo 0xffffffff80a99260 [audit] > 0 0 0 0 SLs vmwait 0xffffffff80a9a79c [swapper] > db> trace 77872 > Tracing pid 77872 tid 100157 td 0xffffff000eb9c350 > sched_switch() at sched_switch+0x1fe > mi_switch() at mi_switch+0x189 > sleepq_wait() at sleepq_wait+0x3b > _cv_wait() at _cv_wait+0xfe > zio_wait() at zio_wait+0x5f > dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x1f6 > dmu_buf_hold_array() at dmu_buf_hold_array+0x62 > dmu_read_uio() at dmu_read_uio+0x3f > zfs_freebsd_read() at zfs_freebsd_read+0x535 > vn_read() at vn_read+0x1ca > dofileread() at dofileread+0xa1 > kern_readv() at kern_readv+0x4c > read() at read+0x54 > syscall() at syscall+0x254 > Xfast_syscall() at Xfast_syscall+0xab > --- syscall (3, FreeBSD ELF64, read), rip = 0x8bd74c, rsp = > 0x7fffffffe2c8, rbp = 0 --- > db> trace 51 > Tracing pid 51 tid 100050 td 0xffffff00014246a0 > sched_switch() at sched_switch+0x1fe > mi_switch() at mi_switch+0x189 > sleepq_wait() at sleepq_wait+0x3b > _sx_xlock_hard() at _sx_xlock_hard+0x1ee > _sx_xlock() at _sx_xlock+0x4e > vdev_queue_io() at vdev_queue_io+0x74 > vdev_geom_io_start() at vdev_geom_io_start+0x4a > vdev_mirror_io_start() at vdev_mirror_io_start+0x1b0 > zil_lwb_write_start() at zil_lwb_write_start+0x2f1 > zil_commit_writer() at zil_commit_writer+0x1c4 > zil_commit() at zil_commit+0xb8 > zfs_sync() at zfs_sync+0x9a > sync_fsync() at sync_fsync+0x1ac > sched_sync() at sched_sync+0x63f > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffffffb502fd30, rbp = 0 --- > > Any ideas about this? > I quote the entire email to preserve context, although that seems rather excessive. I can only say that I've observed hangs like this at the same location in the kernel as in the first stack trace (_cv_wait -> sleepq_wait). Since I don't have important file systems like /usr, /var etc. under ZFS I've always been able to recover by a) raising the priority of the blocked process with nice b) then killing the process Strangely enough I've always been able to access the file system on which the process was blocked (e.g. ls) from a different terminal. So the hang seems to be limited to only the one process and not to be a symptom of ZFS itself wedging. Or maybe it's just that my ls was accessing different parts of the filesystem not covered by the CV? FIIK. Otherwise I have no idea what's going on. I mentioned this some time ago (months?) on -current but never got any response. I didn't have any nice trace, though. --- Gary Jennejohn From owner-freebsd-fs@FreeBSD.ORG Mon Mar 31 13:58:01 2008 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5B988106564A for ; Mon, 31 Mar 2008 13:58:01 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 720038FC33 for ; Mon, 31 Mar 2008 13:57:59 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from [172.16.151.53] (fw.axelero.hu [195.228.243.120]) by people.fsn.hu (Postfix) with ESMTP id B4DE3AD834; Mon, 31 Mar 2008 15:57:47 +0200 (CEST) Message-ID: <47F0EDD6.8060402@fsn.hu> Date: Mon, 31 Mar 2008 15:57:42 +0200 From: Attila Nagy User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: gary.jennejohn@freenet.de References: <47F0D02B.8060504@fsn.hu> <20080331152251.62526181@peedub.jennejohn.org> In-Reply-To: <20080331152251.62526181@peedub.jennejohn.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: ZFS hangs very often X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Mar 2008 13:58:01 -0000 On 2008.03.31. 15:22, Gary Jennejohn wrote: > On Mon, 31 Mar 2008 13:51:07 +0200 > Attila Nagy wrote: > > >> Hello, >> >> On my desktop machine I use a ZFS pool for everything but the swap and >> the root fs (so /usr, /tmp, and everything else is on ZFS, swap and / is >> on gmirror of two-two partitions). >> >> The first setup was a FreeBSD/i386 7-STABLE, and the pool consisted of >> two partitions from two SATA disks, which were encrypted with GELI >> individually. >> >> After using the system for some weeks (without any higher number of IO >> activity, just working on the machine), the first hang came: I couldn't >> move the mouse under X, but remote sessions were alive and also the >> clock app still counted the time fine. I couldn't log into the machine >> via ssh, the port was open, but I haven't got the banner. >> I've done a portupgrade at that time. >> >> After this (and several other) hangs, I decided to remove GELI from the >> equation without success. Then one partition (disk) instead of two. And >> now, I am running amd64 instead of i386 and the problem still persists. >> >> I've attached my notebook to the machine and here is what I have during >> the hang (currently, I am in the process of upgrading some ports and now >> a configure tries to run, but the machine has stopped): >> KDB: enter: manual escape to debugger >> [thread pid 23 tid 100022 ] >> Stopped at kdb_enter+0x31: leave >> db> bt >> Tracing pid 23 tid 100022 td 0xffffff000127c350 >> kdb_enter() at kdb_enter+0x31 >> scgetc() at scgetc+0x461 >> sckbdevent() at sckbdevent+0xa4 >> kbdmux_intr() at kbdmux_intr+0x43 >> kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20 >> taskqueue_run() at taskqueue_run+0x9f >> ithread_loop() at ithread_loop+0x180 >> fork_exit() at fork_exit+0x11f >> fork_trampoline() at fork_trampoline+0xe >> --- trap 0, rip = 0, rsp = 0xffffffffb4f04d30, rbp = 0 --- >> db> ps >> pid ppid pgrp uid state wmesg wchan cmd >> 77873 77871 76757 0 S+ piperd 0xffffff0010036ba0 as >> 77872 77871 76757 0 S+ zfs:(&zi 0xffffff0020ba0298 cc1plus >> 77871 77870 76757 0 S+ wait 0xffffff0001446000 c++ >> 77870 77321 76757 0 S+ wait 0xffffff000eba7468 sh >> 77321 76882 76757 0 S+ wait 0xffffff00396448d0 sh >> 76882 76881 76757 0 S+ wait 0xffffff000eba5468 sh >> 76881 76757 76757 0 S+ wait 0xffffff00014b28d0 sh >> 76757 76755 76757 0 Ss+ wait 0xffffff001b369000 make >> 76755 62725 62725 0 S+ select 0xffffffff80a89d50 script >> 62725 817 62725 0 S+ wait 0xffffff0001f43000 initial thread >> 86757 86674 86757 1001 SL+ pfault 0xffffffff80a9a79c ssh >> 86674 86672 86674 1001 Ss+ wait 0xffffff0001f428d0 bash >> 86672 86670 86670 1001 S select 0xffffffff80a89d50 sshd >> 86670 721 86670 0 Ss sbwait 0xffffff0001d2015c sshd >> 62788 802 62788 0 S+ ttyin 0xffffff00013bac10 csh >> 46310 801 46310 0 ?+ csh >> 817 800 817 0 S+ pause 0xffffff0001f420c0 csh >> 807 1 807 0 Ss+ ttyin 0xffffff000139e810 getty >> 806 1 806 0 Ss+ ttyin 0xffffff00013bb410 getty >> 805 1 805 0 Ss+ ttyin 0xffffff00013ba810 getty >> 804 1 804 0 Ss+ ttyin 0xffffff00013ba010 getty >> 803 1 803 0 Ss+ ttyin 0xffffff00013b9410 getty >> 802 1 802 0 Ss+ wait 0xffffff00014b08d0 login >> 801 1 801 0 Ss+ wait 0xffffff0001567468 login >> 800 1 800 0 Ss+ wait 0xffffff00014b0468 login >> 737 1 737 0 ?s cron >> 731 1 731 25 Ss pause 0xffffff00015650c0 sendmail >> 727 1 727 0 ?s sendmail >> 721 1 721 0 Ss select 0xffffffff80a89d50 sshd >> 688 687 687 123 ? ntpd >> 687 1 687 0 Ss select 0xffffffff80a89d50 ntpd >> 668 1 668 0 Ss select 0xffffffff80a89d50 powerd >> 559 1 559 0 ?s syslogd >> 488 1 488 0 Ss select 0xffffffff80a89d50 devd >> 440 1 440 0 Ss select 0xffffffff80a89d50 moused >> 248 1 248 0 Ss pause 0xffffff00015640c0 adjkerntz >> 175 0 0 0 SL zfs:(&tq 0xffffff0001583080 [zil_clean] >> 174 0 0 0 SL zfs:(&tq 0xffffff00015831c0 [zil_clean] >> 173 0 0 0 SL zfs:(&tq 0xffffff0001583300 [zil_clean] >> 172 0 0 0 SL zfs:(&tq 0xffffff0001583440 [zil_clean] >> 171 0 0 0 SL zfs:(&tq 0xffffff0001583580 [zil_clean] >> 170 0 0 0 SL zfs:(&tq 0xffffff00015836c0 [zil_clean] >> 168 0 0 0 SL zfs:(&tx 0xffffff000146c590 >> [txg_thread_enter] >> 167 0 0 0 SL zfs:(&zi 0xffffff000c717d58 >> [txg_thread_enter] >> 166 0 0 0 SL zfs:(&tx 0xffffff00014e0a40 >> [txg_thread_enter] >> 165 0 0 0 SL vgeom:io 0xffffff000145c410 >> [vdev:worker ad0s1d] >> 164 0 0 0 SL zfs:(&tq 0xffffff000158e300 >> [spa_zio_intr_5] >> 163 0 0 0 SL zfs:(&tq 0xffffff000158e300 >> [spa_zio_intr_5] >> 162 0 0 0 SL zfs:(&tq 0xffffff000158e1c0 >> [spa_zio_issue_5] >> 161 0 0 0 SL zfs:(&tq 0xffffff000158e1c0 >> [spa_zio_issue_5] >> 160 0 0 0 SL zfs:(&tq 0xffffff0001227d00 >> [spa_zio_intr_4] >> 159 0 0 0 SL zfs:(&tq 0xffffff0001227d00 >> [spa_zio_intr_4] >> 158 0 0 0 SL zfs:(&tq 0xffffff0001227bc0 >> [spa_zio_issue_4] >> 157 0 0 0 SL zfs:(&tq 0xffffff0001227bc0 >> [spa_zio_issue_4] >> 156 0 0 0 SL zfs:(&tq 0xffffff0001227a80 >> [spa_zio_intr_3] >> 155 0 0 0 SL zfs:(&tq 0xffffff0001227a80 >> [spa_zio_intr_3] >> 154 0 0 0 SL zfs:(&tq 0xffffff0001227940 >> [spa_zio_issue_3] >> 153 0 0 0 SL zfs:(&tq 0xffffff0001227940 >> [spa_zio_issue_3] >> 152 0 0 0 SL zfs:&vq- 0xffffff00015d0c88 >> [spa_zio_intr_2] >> 151 0 0 0 SL vmwait 0xffffffff80a9a79c >> [spa_zio_intr_2] >> 150 0 0 0 SL vmwait 0xffffffff80a9a79c >> [spa_zio_issue_2] >> 149 0 0 0 SL vmwait 0xffffffff80a9a79c >> [spa_zio_issue_2] >> 148 0 0 0 SL zfs:(&tq 0xffffff0001227580 >> [spa_zio_intr_1] >> 147 0 0 0 SL zfs:(&tq 0xffffff0001227580 >> [spa_zio_intr_1] >> 146 0 0 0 SL zfs:(&tq 0xffffff0001227440 >> [spa_zio_issue_1] >> 145 0 0 0 SL zfs:(&tq 0xffffff0001227440 >> [spa_zio_issue_1] >> 144 0 0 0 SL zfs:(&tq 0xffffff00012271c0 >> [spa_zio_intr_0] >> 143 0 0 0 SL zfs:(&tq 0xffffff00012271c0 >> [spa_zio_intr_0] >> 142 0 0 0 SL zfs:(&tq 0xffffff0001227300 >> [spa_zio_issue_0] >> 141 0 0 0 SL zfs:(&tq 0xffffff0001227300 >> [spa_zio_issue_0] >> 87 0 0 0 SL vmwait 0xffffffff80a9a79c [g_eli[1] >> mirror/swa] >> 86 0 0 0 SL vmwait 0xffffffff80a9a79c [g_eli[0] >> mirror/swa] >> 53 0 0 0 SL sdflush 0xffffffff80a99d88 [softdepflush] >> 52 0 0 0 SL vlruwt 0xffffff0001448000 [vnlru] >> 51 0 0 0 SL zfs:&vq- 0xffffff00015d0c88 [syncer] >> 50 0 0 0 SL psleep 0xffffffff80a8a55c [bufdaemon] >> 49 0 0 0 SL pgzero 0xffffffff80a9b804 [pagezero] >> 48 0 0 0 SL psleep 0xffffffff80a9ab48 [vmdaemon] >> 47 0 0 0 SL wswbuf0 0xffffffff80a9a004 [pagedaemon] >> 46 0 0 0 SL m:w1 0xffffff0001401200 [g_mirror swap] >> 45 0 0 0 SL m:w1 0xffffff00013c3800 [g_mirror root] >> 44 0 0 0 SL zfs:(&ar 0xffffffff80c746b0 >> [arc_reclaim_thread] >> 43 0 0 0 SL waiting_ 0xffffffff80a8dc88 [sctp_iterator] >> 42 0 0 0 WL [swi0: sio] >> 41 0 0 0 WL [irq1: atkbd0] >> 40 0 0 0 WL [irq15: ata1] >> 39 0 0 0 WL [irq14: ata0] >> 38 0 0 0 SL usbevt 0xffffff000133a420 [usb5] >> 37 0 0 0 SL usbevt 0xffffffff81065420 [usb4] >> 36 0 0 0 SL usbevt 0xffffffff81063420 [usb3] >> 35 0 0 0 SL usbevt 0xffffff000130c420 [usb2] >> 34 0 0 0 WL [irq22: ehci0] >> 33 0 0 0 SL usbevt 0xffffffff81061420 [usb1] >> 32 0 0 0 WL [irq21: >> pcm0 uhci1+] >> 31 0 0 0 SL usbtsk 0xffffffff80a71028 [usbtask-dr] >> 30 0 0 0 SL usbtsk 0xffffffff80a71000 [usbtask-hc] >> 29 0 0 0 SL usbevt 0xffffffff8105f420 [usb0] >> 28 0 0 0 WL [irq20: >> uhci0 uhci+] >> 27 0 0 0 SL - 0xffffff00012ef880 [em0 taskq] >> 26 0 0 0 WL [irq9: acpi0] >> 25 0 0 0 SL - 0xffffff0001294580 [kqueue taskq] >> 24 0 0 0 WL [swi6: task >> queue] >> 23 0 0 0 RL CPU 1 [swi6: >> Giant taskq] >> 22 0 0 0 SL - 0xffffff000122c500 [thread taskq] >> 21 0 0 0 WL [swi5: +] >> 20 0 0 0 SL - 0xffffff000122ca80 [acpi_task_2] >> 19 0 0 0 SL - 0xffffff000122ca80 [acpi_task_1] >> 18 0 0 0 SL - 0xffffff000122ca80 [acpi_task_0] >> 17 0 0 0 WL [swi2: cambio] >> 9 0 0 0 SL ccb_scan 0xffffffff80a3fda0 [xpt_thrd] >> 16 0 0 0 SL - 0xffffffff80a74ea8 [yarrow] >> 8 0 0 0 SL crypto_r 0xffffffff80d293b0 [crypto >> returns] >> 7 0 0 0 SL crypto_w 0xffffffff80d29350 [crypto] >> 6 0 0 0 SL zfs:(&tq 0xffffff0001227080 [system_taskq] >> 5 0 0 0 SL zfs:(&tq 0xffffff0001227080 [system_taskq] >> 4 0 0 0 SL - 0xffffffff80a71838 [g_down] >> 3 0 0 0 SL - 0xffffffff80a71830 [g_up] >> 2 0 0 0 SL - 0xffffffff80a71820 [g_event] >> 15 0 0 0 WL [swi1: net] >> 14 0 0 0 WL [swi3: vm] >> 13 0 0 0 LL *Giant 0xffffff00015a2be0 [swi4: >> clock sio] >> 12 0 0 0 RL CPU 0 [idle: cpu0] >> 11 0 0 0 RL [idle: cpu1] >> 1 0 1 0 SLs wait 0xffffff000112f8d0 [init] >> 10 0 0 0 SL audit_wo 0xffffffff80a99260 [audit] >> 0 0 0 0 SLs vmwait 0xffffffff80a9a79c [swapper] >> db> trace 77872 >> Tracing pid 77872 tid 100157 td 0xffffff000eb9c350 >> sched_switch() at sched_switch+0x1fe >> mi_switch() at mi_switch+0x189 >> sleepq_wait() at sleepq_wait+0x3b >> _cv_wait() at _cv_wait+0xfe >> zio_wait() at zio_wait+0x5f >> dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x1f6 >> dmu_buf_hold_array() at dmu_buf_hold_array+0x62 >> dmu_read_uio() at dmu_read_uio+0x3f >> zfs_freebsd_read() at zfs_freebsd_read+0x535 >> vn_read() at vn_read+0x1ca >> dofileread() at dofileread+0xa1 >> kern_readv() at kern_readv+0x4c >> read() at read+0x54 >> syscall() at syscall+0x254 >> Xfast_syscall() at Xfast_syscall+0xab >> --- syscall (3, FreeBSD ELF64, read), rip = 0x8bd74c, rsp = >> 0x7fffffffe2c8, rbp = 0 --- >> db> trace 51 >> Tracing pid 51 tid 100050 td 0xffffff00014246a0 >> sched_switch() at sched_switch+0x1fe >> mi_switch() at mi_switch+0x189 >> sleepq_wait() at sleepq_wait+0x3b >> _sx_xlock_hard() at _sx_xlock_hard+0x1ee >> _sx_xlock() at _sx_xlock+0x4e >> vdev_queue_io() at vdev_queue_io+0x74 >> vdev_geom_io_start() at vdev_geom_io_start+0x4a >> vdev_mirror_io_start() at vdev_mirror_io_start+0x1b0 >> zil_lwb_write_start() at zil_lwb_write_start+0x2f1 >> zil_commit_writer() at zil_commit_writer+0x1c4 >> zil_commit() at zil_commit+0xb8 >> zfs_sync() at zfs_sync+0x9a >> sync_fsync() at sync_fsync+0x1ac >> sched_sync() at sched_sync+0x63f >> fork_exit() at fork_exit+0x11f >> fork_trampoline() at fork_trampoline+0xe >> --- trap 0, rip = 0, rsp = 0xffffffffb502fd30, rbp = 0 --- >> >> Any ideas about this? >> >> > > I quote the entire email to preserve context, although that seems rather > excessive. > > I can only say that I've observed hangs like this at the same location > in the kernel as in the first stack trace (_cv_wait -> sleepq_wait). > > Since I don't have important file systems like /usr, /var etc. under ZFS > I've always been able to recover by > a) raising the priority of the blocked process with nice > b) then killing the process > > Strangely enough I've always been able to access the file system on which > the process was blocked (e.g. ls) from a different terminal. So the hang seems > to be limited to only the one process and not to be a symptom of ZFS itself > wedging. Or maybe it's just that my ls was accessing different parts of the > filesystem not covered by the CV? FIIK. > > Otherwise I have no idea what's going on. > > I mentioned this some time ago (months?) on -current but never got any > response. I didn't have any nice trace, though. > My system completely locks up, I can't start new processes, but runnings ones -which don't do IO- can continue (for example a top). I don't know ZFS internals (BTW, /usr and others are of course different ZFS filesystems on the pool), but it might be, that something major gets locked and that's why it stops here. Anyways, if somebody can help to back this out, I'm here to try patches, or do experiments. Thanks, ps: -CURRENT from around a month/half months ago still have this problem. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 31 14:15:14 2008 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 67A17106564A for ; Mon, 31 Mar 2008 14:15:14 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 248C18FC17 for ; Mon, 31 Mar 2008 14:15:13 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from [172.16.151.53] (fw.axelero.hu [195.228.243.120]) by people.fsn.hu (Postfix) with ESMTP id 8BA1FADB00; Mon, 31 Mar 2008 16:15:06 +0200 (CEST) Message-ID: <47F0F1E8.1080504@fsn.hu> Date: Mon, 31 Mar 2008 16:15:04 +0200 From: Attila Nagy User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: gary.jennejohn@freenet.de References: <47F0D02B.8060504@fsn.hu> <20080331152251.62526181@peedub.jennejohn.org> <47F0EDD6.8060402@fsn.hu> In-Reply-To: <47F0EDD6.8060402@fsn.hu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: ZFS hangs very often X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Mar 2008 14:15:14 -0000 On 2008.03.31. 15:57, Attila Nagy wrote: > My system completely locks up, I can't start new processes, but > runnings ones -which don't do IO- can continue (for example a top). > I don't know ZFS internals (BTW, /usr and others are of course > different ZFS filesystems on the pool), but it might be, that > something major gets locked and that's why it stops here. > > Anyways, if somebody can help to back this out, I'm here to try > patches, or do experiments. I forgot to tell -I don't know, maybe it's important-, that I have an SMP box (but tried with UP kernel, the effect is the same) and compression is enabled on every filesystems. From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 04:46:50 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E85751065689; Tue, 1 Apr 2008 04:46:49 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12]) by mx1.freebsd.org (Postfix) with ESMTP id 24C488FC23; Tue, 1 Apr 2008 04:46:48 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0) id DC1CF4D5CC0; Tue, 1 Apr 2008 12:28:09 +0800 (MYT) Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by mmu.edu.my (Postfix) with ESMTP id 415B755E4AC for ; Thu, 27 Mar 2008 13:37:20 +0800 (MYT) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id B033215637B; Thu, 27 Mar 2008 05:36:44 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id E74D91065768; Thu, 27 Mar 2008 05:36:42 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 493011065670 for ; Thu, 27 Mar 2008 05:36:33 +0000 (UTC) (envelope-from freebsd-current@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 069F28FC1E for ; Thu, 27 Mar 2008 05:36:32 +0000 (UTC) (envelope-from freebsd-current@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Jekmm-0004CW-7x for freebsd-current@freebsd.org; Thu, 27 Mar 2008 05:36:24 +0000 Received: from 195.208.174.178 ([195.208.174.178]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 27 Mar 2008 05:36:24 +0000 Received: from vadim_nuclight by 195.208.174.178 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 27 Mar 2008 05:36:24 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-current@freebsd.org From: Vadim Goncharov Followup-To: gmane.os.freebsd.current Date: Thu, 27 Mar 2008 05:36:15 +0000 (UTC) Organization: Nuclear Lightning @ Tomsk, TPU AVTF Hostel Lines: 22 Message-ID: References: <47E9448F.1010304@ipfw.ru> <20080326142115.K34007@fledge.watson.org> X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 195.208.174.178 X-Comment-To: Robert Watson User-Agent: slrn/0.9.8.1 (FreeBSD) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-current@freebsd.org Errors-To: owner-freebsd-current@freebsd.org Cc: freebsd-fs@freebsd.org Subject: Re: unionfs status X-BeenThere: freebsd-fs@freebsd.org Reply-To: vadim_nuclight@mail.ru List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 04:46:50 -0000 Hi Robert Watson! On Wed, 26 Mar 2008 14:53:25 +0000 (GMT); Robert Watson wrote about 'Re: unionfs status': > You can imagine a number of schemes to replicate pointer changes around or > track the various outstanding references, but I think a more fundamental > question is whether this is in fact the right behavior at all. The premise of > is that writes flow up, but not down, and "connections" to sockets are > read-write events, not read events, most typically. If you're using unionfs > to take a template system and "broadcast it" to many jails, you probably don't > want all the jails talking to the same syslogd, you want them each talking to > their own. When syslogd in a jail finds a disconnected socket, which is > effectively what a NULL v_socket pointer means, in /var/run/log, it should be > unlinking it and creating a new socket, not reusing the existing file on disk. This code's use in jails is primarily intended for mysql (and the like daemons), not syslogd (for which you said it right). Such daemons really require broadcasting, yep - so unionfs should support it... -- WBR, Vadim Goncharov. ICQ#166852181 mailto:vadim_nuclight@mail.ru [Moderator of RU.ANTI-ECOLOGY][FreeBSD][http://antigreen.org][LJ:/nuclight] _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 04:53:21 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 89B1A1065670; Tue, 1 Apr 2008 04:53:21 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12]) by mx1.freebsd.org (Postfix) with ESMTP id 10E8C8FC28; Tue, 1 Apr 2008 04:53:21 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0) id 226574D626B; Tue, 1 Apr 2008 12:33:53 +0800 (MYT) Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by mmu.edu.my (Postfix) with ESMTP id 8592755E4A8 for ; Thu, 27 Mar 2008 21:58:24 +0800 (MYT) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id D08991A8F09; Thu, 27 Mar 2008 13:56:57 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 5A3F21065716; Thu, 27 Mar 2008 13:56:55 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B67951065676; Thu, 27 Mar 2008 13:56:41 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 72A898FC34; Thu, 27 Mar 2008 13:56:41 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 4949846B96; Thu, 27 Mar 2008 09:56:40 -0400 (EDT) Date: Thu, 27 Mar 2008 13:56:40 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Vadim Goncharov In-Reply-To: Message-ID: <20080327135318.R73942@fledge.watson.org> References: <47E9448F.1010304@ipfw.ru> <20080326142115.K34007@fledge.watson.org> <20080327062556.GE3180@home.opsec.eu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-current@freebsd.org Errors-To: owner-freebsd-current@freebsd.org Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: unionfs status X-BeenThere: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 04:53:21 -0000 On Thu, 27 Mar 2008, Vadim Goncharov wrote: >> Thanks for this description. So we basically have two different uses for >> UNIX sockets in unionfs with jails ? > >> 1) socket in jail to communicate only inside one jail (syslog-case) 2) >> socket in jail as a means of IPC between different jails (mysql-case) > >> Is 2) really supposed to work like this ? > > This is user's/admin's point of view, that it should work this way: one > mysql with one socket for several jails. I don't know all gory details about > how code really works. As I see it, nullfs should provide a shared socket, it is intended to provide access to the same object, and unionfs should provide independent sockets, as unionfs is intended to provide isolation. Robert N M Watson Computer Laboratory University of Cambridge _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 05:05:51 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 49D6210661EC for ; Tue, 1 Apr 2008 05:05:51 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12]) by mx1.freebsd.org (Postfix) with ESMTP id 7D4F08FC1C for ; Tue, 1 Apr 2008 05:05:50 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0) id D3A094D52EA; Tue, 1 Apr 2008 12:24:25 +0800 (MYT) Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by mmu.edu.my (Postfix) with ESMTP id 0F31F55E498 for ; Thu, 27 Mar 2008 14:55:55 +0800 (MYT) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 60193162CBA; Thu, 27 Mar 2008 06:55:18 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 66573106566B; Thu, 27 Mar 2008 06:55:14 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 64518106566C for ; Thu, 27 Mar 2008 06:55:05 +0000 (UTC) (envelope-from freebsd-current@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 1E52B8FC20 for ; Thu, 27 Mar 2008 06:55:04 +0000 (UTC) (envelope-from freebsd-current@m.gmane.org) Received: from root by ciao.gmane.org with local (Exim 4.43) id 1Jem0s-0007K2-V4 for freebsd-current@freebsd.org; Thu, 27 Mar 2008 06:55:02 +0000 Received: from 195.208.174.178 ([195.208.174.178]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 27 Mar 2008 06:55:02 +0000 Received: from vadim_nuclight by 195.208.174.178 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 27 Mar 2008 06:55:02 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-current@freebsd.org From: Vadim Goncharov Followup-To: gmane.os.freebsd.current Date: Thu, 27 Mar 2008 06:51:37 +0000 (UTC) Organization: Nuclear Lightning @ Tomsk, TPU AVTF Hostel Lines: 30 Message-ID: References: <47E9448F.1010304@ipfw.ru> <20080326142115.K34007@fledge.watson.org> <20080327062556.GE3180@home.opsec.eu> X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 195.208.174.178 X-Comment-To: Kurt Jaeger User-Agent: slrn/0.9.8.1 (FreeBSD) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-current@freebsd.org Errors-To: owner-freebsd-current@freebsd.org Cc: freebsd-fs@freebsd.org Subject: Re: unionfs status X-BeenThere: freebsd-fs@freebsd.org Reply-To: vadim_nuclight@mail.ru List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 05:05:51 -0000 Hi Kurt Jaeger! On Thu, 27 Mar 2008 07:25:56 +0100; Kurt Jaeger wrote about 'Re: unionfs status': >>> If you're using unionfs >>> to take a template system and "broadcast it" to many jails, you probably don't >>> want all the jails talking to the same syslogd, you want them each talking to >>> their own. When syslogd in a jail finds a disconnected socket, which is >>> effectively what a NULL v_socket pointer means, in /var/run/log, it should be >>> unlinking it and creating a new socket, not reusing the existing file on disk. >> This code's use in jails is primarily intended for mysql (and the like >> daemons), not syslogd (for which you said it right). Such daemons really >> require broadcasting, yep - so unionfs should support it... > Thanks for this description. So we basically have two different > uses for UNIX sockets in unionfs with jails ? > 1) socket in jail to communicate only inside one jail (syslog-case) > 2) socket in jail as a means of IPC between different jails (mysql-case) > Is 2) really supposed to work like this ? This is user's/admin's point of view, that it should work this way: one mysql with one socket for several jails. I don't know all gory details about how code really works. -- WBR, Vadim Goncharov. ICQ#166852181 mailto:vadim_nuclight@mail.ru [Moderator of RU.ANTI-ECOLOGY][FreeBSD][http://antigreen.org][LJ:/nuclight] _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 05:05:54 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 370B710662AA for ; Tue, 1 Apr 2008 05:05:54 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12]) by mx1.freebsd.org (Postfix) with ESMTP id B3BA08FC14 for ; Tue, 1 Apr 2008 05:05:53 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0) id 0DA264D53DC; Tue, 1 Apr 2008 12:24:53 +0800 (MYT) Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by mmu.edu.my (Postfix) with ESMTP id 14BBF55E4F9 for ; Thu, 27 Mar 2008 14:28:37 +0800 (MYT) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 9475F1A50D9; Thu, 27 Mar 2008 06:26:06 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 9B0331065676; Thu, 27 Mar 2008 06:26:06 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02E45106566C; Thu, 27 Mar 2008 06:25:58 +0000 (UTC) (envelope-from lists@c0mplx.org) Received: from home.opsec.eu (unknown [IPv6:2001:14f8:200::1]) by mx1.freebsd.org (Postfix) with ESMTP id B34128FC15; Thu, 27 Mar 2008 06:25:57 +0000 (UTC) (envelope-from lists@c0mplx.org) Received: from pi by home.opsec.eu with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1JelYi-000CgM-Qt; Thu, 27 Mar 2008 07:25:56 +0100 Date: Thu, 27 Mar 2008 07:25:56 +0100 From: Kurt Jaeger To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org Message-ID: <20080327062556.GE3180@home.opsec.eu> References: <47E9448F.1010304@ipfw.ru> <20080326142115.K34007@fledge.watson.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-current@freebsd.org Errors-To: owner-freebsd-current@freebsd.org Cc: Subject: Re: unionfs status X-BeenThere: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 05:05:54 -0000 Vadim Goncharov wrote: > Robert Watson wrote: > > If you're using unionfs > > to take a template system and "broadcast it" to many jails, you probably don't > > want all the jails talking to the same syslogd, you want them each talking to > > their own. When syslogd in a jail finds a disconnected socket, which is > > effectively what a NULL v_socket pointer means, in /var/run/log, it should be > > unlinking it and creating a new socket, not reusing the existing file on disk. > This code's use in jails is primarily intended for mysql (and the like > daemons), not syslogd (for which you said it right). Such daemons really > require broadcasting, yep - so unionfs should support it... Thanks for this description. So we basically have two different uses for UNIX sockets in unionfs with jails ? 1) socket in jail to communicate only inside one jail (syslog-case) 2) socket in jail as a means of IPC between different jails (mysql-case) Is 2) really supposed to work like this ? -- pi@opsec.eu +49 171 3101372 12 years to go ! _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 05:09:29 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 031301066FA8; Tue, 1 Apr 2008 05:09:29 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12]) by mx1.freebsd.org (Postfix) with ESMTP id EC1B08FC1B; Tue, 1 Apr 2008 05:09:27 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0) id 0FAA44D58D7; Tue, 1 Apr 2008 12:11:03 +0800 (MYT) Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by mmu.edu.my (Postfix) with ESMTP id E4B1755E491 for ; Wed, 26 Mar 2008 23:59:54 +0800 (MYT) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id ADC4C1A5B7C; Wed, 26 Mar 2008 15:59:17 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id C472E106573A; Wed, 26 Mar 2008 15:59:16 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A695F106566B; Wed, 26 Mar 2008 15:59:01 +0000 (UTC) (envelope-from daichi@freebsd.org) Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90]) by mx1.freebsd.org (Postfix) with ESMTP id 7BA3F8FC12; Wed, 26 Mar 2008 15:59:01 +0000 (UTC) (envelope-from daichi@freebsd.org) Received: from parancell.ongs.co.jp (dullmdaler.ongs.co.jp [202.216.246.94]) by natial.ongs.co.jp (Postfix) with ESMTP id 8F299125438; Thu, 27 Mar 2008 00:39:20 +0900 (JST) Message-ID: <47EA6E27.3060006@freebsd.org> Date: Thu, 27 Mar 2008 00:39:19 +0900 From: Daichi GOTO User-Agent: Thunderbird 2.0.0.12 (X11/20080325) MIME-Version: 1.0 To: "Alexander V. Chernikov" References: <47E9448F.1010304@ipfw.ru> In-Reply-To: <47E9448F.1010304@ipfw.ru> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-current@freebsd.org Errors-To: owner-freebsd-current@freebsd.org Cc: freebsd-fs@freebsd.org, freebsd-current@FreeBSD.org, Kurt Jaeger , Robert Watson , dindin@yandex-team.ru Subject: Re: unionfs status X-BeenThere: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 05:09:29 -0000 I should say that so sorry of my slow response. so sorry. We are developing unionfs step by step and still have 5 next patches. http://people.freebsd.org/~daichi/unionfs/experiments/ http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-p20-1.diff http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-p20-2.diff http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-p20-3.diff http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-p20-4.diff http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-p20-5.diff p20-1: leads panic when "no error happens, eofflag is 0, response data is empty and DIAGNOSTIC is defined" while involving VOP_READDIR(9) from unionfs. This change fixes system hang-up using with NFS. p20-2: fixed fs access issue mounting on devfs. p20-3: fixed kern/109377. p20-4: fixed rename panic issue p20-5: fixed unix socket connection issue On our long unionfs running test, It looks like works very well. Would you try above patches? So sorry of my slow response. Please accept my deepest apology. We are planing to commit above patches to 8-current. 7-release has been done. It is good time to commit it to current ;) Alexander V. Chernikov wrote: > Hello people! > > At this moment unionfs has got at least following problems: > 1) File systems cannot mount onto upper/lower unionfs layer (partially > described in kern/117829) > 2) There are problems with multithreaded programs accessing(writing) > files on unionfs (kern/109950) > 3) As well there are problems with accessing unix sockets created on > upper/lower unionfs layers (kern/118346) > 4) Doing mv filename same-filename causes kernel to panic on 6.X (and > printing warning about VOP_RENAME in 7+) > 5) Making 'loops' when mounting unionfs causes kernel panic (kern/121385) > > I have made patches solving first 4 problems > These patches are available at http://ipfw.ru/patches/ > unionfs2.diff fixes fs mounting onto upper layer, unionfs_lmount.diff > fixes lower > unionfs_threads.diff and unionfs_unix.diff fixes cases 2) and 3) > unionfs_rename.diff fixes case with renaming > > Can anybody comment/review ? -- Daichi GOTO, http://people.freebsd.org/~daichi _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 05:09:29 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 047541066FA9 for ; Tue, 1 Apr 2008 05:09:29 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12]) by mx1.freebsd.org (Postfix) with ESMTP id CC9988FC18 for ; Tue, 1 Apr 2008 05:09:27 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0) id 037FB4D586E; Tue, 1 Apr 2008 12:10:58 +0800 (MYT) Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by mmu.edu.my (Postfix) with ESMTP id E14D455E48B for ; Wed, 26 Mar 2008 22:55:10 +0800 (MYT) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id B7FDA1A5CCF; Wed, 26 Mar 2008 14:53:40 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 3017E1065707; Wed, 26 Mar 2008 14:53:39 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3E5D106566B; Wed, 26 Mar 2008 14:53:26 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 649518FC1F; Wed, 26 Mar 2008 14:53:26 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 2808C46B04; Wed, 26 Mar 2008 10:53:25 -0400 (EDT) Date: Wed, 26 Mar 2008 14:53:25 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: "Alexander V. Chernikov" In-Reply-To: <47E9448F.1010304@ipfw.ru> Message-ID: <20080326142115.K34007@fledge.watson.org> References: <47E9448F.1010304@ipfw.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-current@freebsd.org Errors-To: owner-freebsd-current@freebsd.org Cc: freebsd-fs@freebsd.org, freebsd-current@FreeBSD.org Subject: Re: unionfs status X-BeenThere: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 05:09:29 -0000 On Tue, 25 Mar 2008, Alexander V. Chernikov wrote: > I have made patches solving first 4 problems These patches are available at > http://ipfw.ru/patches/ unionfs2.diff fixes fs mounting onto upper layer, > unionfs_lmount.diff fixes lower unionfs_threads.diff and unionfs_unix.diff > fixes cases 2) and 3) unionfs_rename.diff fixes case with renaming > > Can anybody comment/review ? Dear Alexander, Unfortunately, I don't know too much about unionfs. However, I can comment on the UNIX domain socket patch: > --- sys/fs/unionfs/union_subr.c.orig 2008-03-13 23:10:32.000000000 +0300 > +++ sys/fs/unionfs/union_subr.c 2008-03-13 23:17:34.000000000 +0300 > @@ -160,6 +160,8 @@ > unp->un_path[cnp->cn_namelen] = '\0'; > } > vp->v_type = (uppervp != NULLVP ? uppervp->v_type : lowervp->v_type); > + if (vp->v_type == VSOCK) > + vp->v_socket = (uppervp != NULLVP) ? uppervp->v_socket : lowervp->v_socket; > if ((lowervp != NULLVP) && (lowervp->v_type == VDIR)) > vp->v_mountedhere = lowervp->v_mountedhere; > vp->v_data = unp; I'm a bit worried about this assignment, as it represents an untracked alias for the socket. Let me explain why: UNIX domain sockets may have file system bindings, allowing them to use the file system namespace as a rendezvous for communication. Typical use is that a socket is created, bind() is called on it with a path in some location like /var/run/log. Other processes turn up and connect() to the path, causing a file system lookup to reach the vnode of the socket, and then the socket code follows vp->v_socket to find the socket to connect to. When a bound socket is closed, we follow a back-pointer from the UNIX domain socket to the vnode, and then clear the pointer. Doing this in a race-free manner is somewhat tricky, and I'm not 100% convinced it's correct currently, although it appears to be somewhat close to right. The upshot of all this is that if you copy the pointer value to other vnodes, such as vnodes on upper layer, the UNIX domain socket code won't clear those pointers before freeing the socket they point at. This means that the above code snippet may lead to a v_socket pointer on a higher layer vnode pointing at the right socket, the wrong socket, or possibly some other bit of freed and maybe reused memory. You can imagine a number of schemes to replicate pointer changes around or track the various outstanding references, but I think a more fundamental question is whether this is in fact the right behavior at all. The premise of is that writes flow up, but not down, and "connections" to sockets are read-write events, not read events, most typically. If you're using unionfs to take a template system and "broadcast it" to many jails, you probably don't want all the jails talking to the same syslogd, you want them each talking to their own. When syslogd in a jail finds a disconnected socket, which is effectively what a NULL v_socket pointer means, in /var/run/log, it should be unlinking it and creating a new socket, not reusing the existing file on disk. Robert N M Watson Computer Laboratory University of Cambridge _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 05:24:49 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C34A410656CE; Tue, 1 Apr 2008 05:24:49 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12]) by mx1.freebsd.org (Postfix) with ESMTP id 1C7968FC22; Tue, 1 Apr 2008 05:24:49 +0000 (UTC) (envelope-from root@mmu.edu.my) Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0) id C6FFB4D50C4; Tue, 1 Apr 2008 13:17:26 +0800 (MYT) Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by mmu.edu.my (Postfix) with ESMTP id BDCF855E487 for ; Fri, 28 Mar 2008 03:23:43 +0800 (MYT) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 7BF2C1A72A1; Thu, 27 Mar 2008 19:22:26 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 8773F106576A; Thu, 27 Mar 2008 19:22:23 +0000 (UTC) (envelope-from owner-freebsd-current@freebsd.org) Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 939D5106566B for ; Thu, 27 Mar 2008 19:22:13 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outB.internet-mail-service.net (outb.internet-mail-service.net [216.240.47.225]) by mx1.freebsd.org (Postfix) with ESMTP id 8088A8FC23 for ; Thu, 27 Mar 2008 19:22:13 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Thu, 27 Mar 2008 17:44:19 -0700 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 4EAA12D6010; Thu, 27 Mar 2008 12:22:12 -0700 (PDT) Message-ID: <47EBF3E4.4000607@elischer.org> Date: Thu, 27 Mar 2008 12:22:12 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: Kurt Jaeger References: <47E9448F.1010304@ipfw.ru> <20080326142115.K34007@fledge.watson.org> <20080327062556.GE3180@home.opsec.eu> In-Reply-To: <20080327062556.GE3180@home.opsec.eu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-current@freebsd.org Errors-To: owner-freebsd-current@freebsd.org Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: unionfs status X-BeenThere: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 05:24:49 -0000 Kurt Jaeger wrote: > Vadim Goncharov wrote: >> Robert Watson wrote: > >>> If you're using unionfs >>> to take a template system and "broadcast it" to many jails, you probably don't >>> want all the jails talking to the same syslogd, you want them each talking to >>> their own. When syslogd in a jail finds a disconnected socket, which is >>> effectively what a NULL v_socket pointer means, in /var/run/log, it should be >>> unlinking it and creating a new socket, not reusing the existing file on disk. > >> This code's use in jails is primarily intended for mysql (and the like >> daemons), not syslogd (for which you said it right). Such daemons really >> require broadcasting, yep - so unionfs should support it... > > Thanks for this description. So we basically have two different > uses for UNIX sockets in unionfs with jails ? > > 1) socket in jail to communicate only inside one jail (syslog-case) > 2) socket in jail as a means of IPC between different jails (mysql-case) > > Is 2) really supposed to work like this ? think about it.. the socket is a file interface to a process. if you are reading the same socket, you expect to get the same process. in (1) you put the socket somewhere not shared. in (2) you put the socket somewhere shared. in nullfs you are allowing access to the same vnode via several namespaces positions. A new socket is visible to all jails. In unionfs a new socket would replace the old one and thus be only locally visible (refers to a different vnode to those accessed by the same name in other mounts). > _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 07:54:44 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 098671065670 for ; Tue, 1 Apr 2008 07:54:44 +0000 (UTC) (envelope-from wangyi6854@gmail.com) Received: from ti-out-0910.google.com (ti-out-0910.google.com [209.85.142.187]) by mx1.freebsd.org (Postfix) with ESMTP id 8CDE38FC2D for ; Tue, 1 Apr 2008 07:54:43 +0000 (UTC) (envelope-from wangyi6854@gmail.com) Received: by ti-out-0910.google.com with SMTP id j2so619585tid.3 for ; Tue, 01 Apr 2008 00:54:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=DJW7dMd74MIdDXgD33AMUdw6IjftB2ZRGg2PHvDAybU=; b=cesp+97OYGZm/XliwUrwCb+63W46AyJ7u5S3q/7BHClJYSv1RicWH5BNFtgrUwD0wbBClWFje/ElgyKB8IMRH7b+wiSrhxPLfH5X7Vw53xP4Mr7R+RUT6p97toVDMfDFP7r/0OeD56kYSa+iw0rrX4z4VFbFKBfYplse7hG50Ic= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=bNwpShVcS6Kzw8Ku2mqPB7kVBvAzYhTzILfCz12WbSqSFyf2DsjxYMaWHvLCWJJfM2aPAnjKNKSY6FTHgVAOdghyrAp2r8xbfylwuR8wv3KNaQRBXQ2e2bYH6w1xdwBY/JkDDfT3+rKpLZL6M2jBtSEoytrqai04O7cGisgVol4= Received: by 10.110.31.11 with SMTP id e11mr3341169tie.56.1207034859245; Tue, 01 Apr 2008 00:27:39 -0700 (PDT) Received: by 10.110.10.14 with HTTP; Tue, 1 Apr 2008 00:27:39 -0700 (PDT) Message-ID: <5ea5cca50804010027k51b59658mb28a481c516e84b0@mail.gmail.com> Date: Tue, 1 Apr 2008 15:27:39 +0800 From: "Yi Wang" To: "Attilio Rao" In-Reply-To: <3bbf2fe10802061700p253e68b8s704deb3e5e4ad086@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <3bbf2fe10802061700p253e68b8s704deb3e5e4ad086@mail.gmail.com> Cc: Yar Tikhiy , Doug Barton , Jeff Roberson , freebsd-fs@freebsd.org, Scot Hetzel , freebsd-arch@freebsd.org Subject: Re: [RFC] Remove NTFS kernel support X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 07:54:44 -0000 On 2/7/08, Attilio Rao wrote: > As exposed by several users, NTFS seems to be broken even before first > VFS commits happeing around the end of December. Those commits exposed > some problems about NTFS which are currently under investigation. > Ultimately, This filesystem is also unmaintained at the moment. > > Speaking with jeff, we agreed on what can be a possible compromise: > remove the kernel support for NTFS and maybe take care of the FUSE > implementation. > What I now propose is a small survey which can shade a light on us > about what do you think about this idea and its implications: > - Do you use NTFS? Yes. I have a dual-boot machine. > - Are you interested in maintaining it? No. I'm not familiar with kernel/fs programming. > - Do you know a good reason to not use FUSE ntfs implementation? What Yes. Listening music and watching video on ntfs disks stops frequently using ntfs-3g. > the kernel counter part adds? I've no idea. > - Do you think axing the kernel support a good idea? For servers, Yes. For desktops, NO! > > Thanks, > Attilio > > > > -- > Peace can only be achieved by understanding - A. Einstein > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- Regards, Wang Yi From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 20:15:56 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 603751065676 for ; Tue, 1 Apr 2008 20:15:56 +0000 (UTC) (envelope-from crahman@gmail.com) Received: from gv-out-0910.google.com (gv-out-0910.google.com [216.239.58.184]) by mx1.freebsd.org (Postfix) with ESMTP id E07358FC2A for ; Tue, 1 Apr 2008 20:15:55 +0000 (UTC) (envelope-from crahman@gmail.com) Received: by gv-out-0910.google.com with SMTP id n40so441543gve.39 for ; Tue, 01 Apr 2008 13:15:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; bh=pbI5ViE5a2V95xufsRG0mCZHe6Bh/gcosBz0qc72gbE=; b=MyvLSjPsKYorNPN8IRxeZOQa0AJJ5ypVzACFHYvIslWWoQqsg1SFntzvHsqI9PgweM7CL0PpHIJMn1d61s9L9k4VdL/vdbjEuTJcIv781Qs+7QMr1FHRMBpCezpYysx6AxtACuvVGI6CrQsCiBlw1pJ1AJS5VtwGQpxhYlVpOLA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=gJoZwdu0AM/SUPfSyG0tnbCTzCZHj4HPIOgq5+zgON1aYp64umCTteo3iEvkrm8hj/7mW288qB/KvLbd4YaNEdj+ObEPshn/9sWzsswwecDBZseWiIofIjuaXEVkfx10FVwOY37Lp1zlsZo0tDsO5QIuXl5I2JwKUctSdUljcDY= Received: by 10.142.240.9 with SMTP id n9mr5273269wfh.136.1207079487231; Tue, 01 Apr 2008 12:51:27 -0700 (PDT) Received: by 10.142.188.17 with HTTP; Tue, 1 Apr 2008 12:51:27 -0700 (PDT) Message-ID: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com> Date: Tue, 1 Apr 2008 13:51:27 -0600 From: "Cyrus Rahman" To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: Trouble with snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 20:15:56 -0000 I'm seeing serious problems with snapshot deadlocks on 7.0-RELEASE right now. I haven't been able to set up a test environment to really determine precise details, but this much I know: Filesystem i/o will eventually lock up, requiring a hard reset, after the snapshot mount sleeps permanently on suspfs. Eventually there's a cascade and everything ends up waiting on suspfs. Running a 'sync' after mount hangs is a sure way to propagate the problem. This happens very often - probably 15% probability per snapshot on the server running 7.0. It's bad enough so that it's not realistic to use snapshots there. Other strange things have been observed, in that an entire day's worth of work vanished - after the reset/reboot the filesystems were consistent, but in the state they were in many hours before, at the time the snapshot hung. The snapshot had been observed hanging, but everything else seemed to work so a decision was made to reboot at the end of the day - with disastrous effect! During the day nothing unusual except for the hung snapshot was noticed. I'm guessing everything just got cached (for hours!) and the cache never got flushed. This is happening on a system set up with journaled ufs filesystems, so that may be part of the problem. The system is running amd64 with an Intel Q6600. The filesystem that has trouble with this has a number of large files, about 500-700Mb on it. Filesystems with only small files do not seem to have trouble, even though they are bigger filesystems with more files. I can't think of anything else unique about it. From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 20:19:10 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 737431065673; Tue, 1 Apr 2008 20:19:10 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D18048FC1C; Tue, 1 Apr 2008 20:19:08 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <47F298C2.7040606@FreeBSD.org> Date: Tue, 01 Apr 2008 22:19:14 +0200 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: Cyrus Rahman References: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com> In-Reply-To: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: Trouble with snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 20:19:10 -0000 Cyrus Rahman wrote: > I'm seeing serious problems with snapshot deadlocks on 7.0-RELEASE > right now. I haven't been able to set up a test environment to really > determine precise details, but this much I know: Filesystem i/o will > eventually lock up, requiring a hard reset, after the snapshot mount > sleeps permanently on suspfs. Eventually there's a cascade and > everything ends up waiting on suspfs. Running a 'sync' after mount > hangs is a sure way to propagate the problem. This happens very often > - probably 15% probability per snapshot on the server running 7.0. > It's bad enough so that it's not realistic to use snapshots there. > Other strange things have been observed, in that an entire day's worth > of work vanished - after the reset/reboot the filesystems were consistent, > but in the state they were in many hours before, at the time the snapshot > hung. The snapshot had been observed hanging, but everything else seemed > to work so a decision was made to reboot at the end of the day - with > disastrous effect! During the day nothing unusual except for the hung > snapshot was noticed. I'm guessing everything just got cached (for > hours!) and the cache never got flushed. > > This is happening on a system set up with journaled ufs filesystems, > so that may be part of the problem. The system is running amd64 with > an Intel Q6600. I thought gjournal and soft updates were supposed to be mutually exclusive (the latter is required for UFS snapshots). Anyway, even if they are supposed to work together this interaction is almost certainly the cause. Kris From owner-freebsd-fs@FreeBSD.ORG Wed Apr 2 07:31:10 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C2AE3106564A for ; Wed, 2 Apr 2008 07:31:10 +0000 (UTC) (envelope-from crahman@gmail.com) Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.169]) by mx1.freebsd.org (Postfix) with ESMTP id 3E09F8FC21 for ; Wed, 2 Apr 2008 07:31:10 +0000 (UTC) (envelope-from crahman@gmail.com) Received: by wf-out-1314.google.com with SMTP id 25so2565614wfa.7 for ; Wed, 02 Apr 2008 00:31:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=UxUEHjHBUWON88Pm/JD62zdt6D4AIDI6s3WxZ8PB4qs=; b=Mi7Rol4xzFyp6ZRDqRhD7wkYCF8z07sNS0PNR7StdzwqJS2QOWCgiAa6muRJFkDNY6EDRgJ0MqBLlzkrcpXAvF6eV7kXyGo8DpV52ZoKdLfA2IvgVE8ndvJI2sGAqcBWwp0tzxjQOMnofGD7+wgGxejiRrXQURxDdE39vhegdXI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=o3PLamhvsPyArxLDwZFBwFYrGeRampiVJEZfwvQZUzlBLDZepuZKu1+c/3qqKMfX/Mnbzxg9kMYzFIi15XiAgrZktTOap0hjxc8CRKVq4Eg24AHxxZrCqDAo/0BI0PyqsIp20REEfQYHB2gOscEC+lkGypQ1d1KwPuqF5i/eJ9E= Received: by 10.142.226.2 with SMTP id y2mr5639536wfg.137.1207121470022; Wed, 02 Apr 2008 00:31:10 -0700 (PDT) Received: by 10.142.188.17 with HTTP; Wed, 2 Apr 2008 00:31:10 -0700 (PDT) Message-ID: <9e77bdb50804020031r2fba0840g7281e879522120d5@mail.gmail.com> Date: Wed, 2 Apr 2008 01:31:10 -0600 From: "Cyrus Rahman" To: "Kris Kennaway" In-Reply-To: <47F298C2.7040606@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com> <47F298C2.7040606@FreeBSD.org> Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: Trouble with snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Apr 2008 07:31:10 -0000 > > This is happening on a system set up with journaled ufs filesystems, > > so that may be part of the problem. The system is running amd64 > > with an Intel Q6600. > > I thought gjournal and soft updates were supposed to be mutually > exclusive (the latter is required for UFS snapshots). Anyway, even if > they are supposed to work together this interaction is almost certainly > the cause. I actually think that snapshots are a part of UFS2 and that they work just fine with or without soft updates. I was wondering if the problems I've seen are limited strictly to gjournal-based UFS2 systems. I'm guessing that they are, based upon the fact that the problems are dramatic enough to have shown up in discussion if they were widespread. But I also wondered if perhaps the additional concurrency associated with multiple processors might be a factor. As it is, it may be prudent for someone intending to use dump with snapshots to hold off on building filesystems with gjournal until this is resolved. Other than this problem, the gjournal/ufs integration has worked flawlessly here. From owner-freebsd-fs@FreeBSD.ORG Wed Apr 2 14:38:03 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3216106564A for ; Wed, 2 Apr 2008 14:38:03 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 8B3688FC1B for ; Wed, 2 Apr 2008 14:38:03 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Jh46D-000777-0y for freebsd-fs@freebsd.org; Wed, 02 Apr 2008 14:38:01 +0000 Received: from firewall.andxor.it ([195.223.2.2]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 02 Apr 2008 14:38:01 +0000 Received: from lapo by firewall.andxor.it with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 02 Apr 2008 14:38:01 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Lapo Luchini Date: Wed, 02 Apr 2008 16:37:49 +0200 Lines: 23 Message-ID: References: <47F0D02B.8060504@fsn.hu> <20080331152251.62526181@peedub.jennejohn.org> <47F0EDD6.8060402@fsn.hu> <47F0F1E8.1080504@fsn.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: firewall.andxor.it User-Agent: Thunderbird 2.0.0.12 (X11/20080303) In-Reply-To: <47F0F1E8.1080504@fsn.hu> X-Enigmail-Version: 0.95.6 OpenPGP: id=C8F252FB Sender: news Subject: Re: ZFS hangs very often X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Apr 2008 14:38:03 -0000 Attila Nagy wrote: > On 2008.03.31. 15:57, Attila Nagy wrote: >> My system completely locks up, I can't start new processes, but >> runnings ones -which don't do IO- can continue (for example a top). >> I don't know ZFS internals (BTW, /usr and others are of course >> different ZFS filesystems on the pool), but it might be, that >> something major gets locked and that's why it stops here. > I forgot to tell -I don't know, maybe it's important-, that I have an > SMP box (but tried with UP kernel, the effect is the same) and > compression is enabled on every filesystems. I have similar symptoms, on a Dual AMD64, 4x SATA GELI + RAIDZ. Mainly after I turned off one drive out of 4 in a RAIDZ pool (one of the two SATA channel on the motherboard is flaky, I'm waiting for a new PCI controller) and I can consistently reproduce it mdconfig-uring a 120GB image of a ddrescue-d HDD and then mounting an UFS2 partition on it and moving massive amounts of data in that hangs within few minutes, but it will lock after a few hours anyways even not touching that huge file. I'll try to produce some debugging myself ASAP... Lapo From owner-freebsd-fs@FreeBSD.ORG Thu Apr 3 11:14:09 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 903BF1065671 for ; Thu, 3 Apr 2008 11:14:09 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id 04F838FC18 for ; Thu, 3 Apr 2008 11:14:08 +0000 (UTC) (envelope-from ticso@cicely12.cicely.de) Received: from cicely5.cicely.de ([10.1.1.7]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id m33Ad5ks051636 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 3 Apr 2008 12:39:05 +0200 (CEST) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14]) by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id m33AcxaG038665 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 3 Apr 2008 12:39:00 +0200 (CEST) (envelope-from ticso@cicely12.cicely.de) Received: from cicely12.cicely.de (localhost [127.0.0.1]) by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id m33AcxrE034481; Thu, 3 Apr 2008 12:38:59 +0200 (CEST) (envelope-from ticso@cicely12.cicely.de) Received: (from ticso@localhost) by cicely12.cicely.de (8.13.4/8.13.3/Submit) id m33AcxNh034480; Thu, 3 Apr 2008 12:38:59 +0200 (CEST) (envelope-from ticso) Date: Thu, 3 Apr 2008 12:38:59 +0200 From: Bernd Walter To: Attila Nagy Message-ID: <20080403103858.GX15954@cicely12.cicely.de> References: <47F0D02B.8060504@fsn.hu> <20080331152251.62526181@peedub.jennejohn.org> <47F0EDD6.8060402@fsn.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <47F0EDD6.8060402@fsn.hu> X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha User-Agent: Mutt/1.5.9i X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8, BAYES_00=-2.599 autolearn=ham version=3.2.3 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on cicely12.cicely.de Cc: freebsd-fs@freebsd.org Subject: Re: ZFS hangs very often X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Apr 2008 11:14:09 -0000 On Mon, Mar 31, 2008 at 03:57:42PM +0200, Attila Nagy wrote: > On 2008.03.31. 15:22, Gary Jennejohn wrote: > >On Mon, 31 Mar 2008 13:51:07 +0200 > >Attila Nagy wrote: > My system completely locks up, I can't start new processes, but runnings > ones -which don't do IO- can continue (for example a top). > I don't know ZFS internals (BTW, /usr and others are of course different > ZFS filesystems on the pool), but it might be, that something major gets > locked and that's why it stops here. You can renice and kill a process using top. So if you have a running top you can still test this. I've seen this kind of hangs as well, but since updating to current from 15th march it never happened again, but I'm not aware of any commit that might have fixed it, so in the end it just might be luck. I didn't investigate the problem very much, because I have had several Timeout problems with drives that run fine after adding further drives, which turned out to be a insuffcient power supply and everytime I accessed the second pool at the same time I went into troubles with the drives on the first pool. SATA drives seem to be to crappy to tell why they fail :( > ps: -CURRENT from around a month/half months ago still have this problem. Not for me it seems, but as said above, it may be luck. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From owner-freebsd-fs@FreeBSD.ORG Sat Apr 5 08:12:46 2008 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A129106566B; Sat, 5 Apr 2008 08:12:46 +0000 (UTC) (envelope-from remko@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 025618FC1C; Sat, 5 Apr 2008 08:12:46 +0000 (UTC) (envelope-from remko@FreeBSD.org) Received: from freefall.freebsd.org (remko@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m358CjAx058391; Sat, 5 Apr 2008 08:12:45 GMT (envelope-from remko@freefall.freebsd.org) Received: (from remko@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m358CjmL058387; Sat, 5 Apr 2008 08:12:45 GMT (envelope-from remko) Date: Sat, 5 Apr 2008 08:12:45 GMT Message-Id: <200804050812.m358CjmL058387@freefall.freebsd.org> To: remko@FreeBSD.org, freebsd-i386@FreeBSD.org, freebsd-fs@FreeBSD.org From: remko@FreeBSD.org Cc: Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Apr 2008 08:12:46 -0000 Old Synopsis: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 New Synopsis: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6 Responsible-Changed-From-To: freebsd-i386->freebsd-fs Responsible-Changed-By: remko Responsible-Changed-When: Sat Apr 5 08:11:45 UTC 2008 Responsible-Changed-Why: The backtraces show that amd(8) has a problem, reassign to the fs team to investigate this. http://www.freebsd.org/cgi/query-pr.cgi?pr=122172