From owner-freebsd-fs@FreeBSD.ORG  Mon Mar 31 11:07:00 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 222571065670
	for <freebsd-fs@hub.freebsd.org>; Mon, 31 Mar 2008 11:07:00 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 1365E8FC14
	for <freebsd-fs@hub.freebsd.org>; Mon, 31 Mar 2008 11:07:00 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m2VB6xm6038892
	for <freebsd-fs@FreeBSD.org>; Mon, 31 Mar 2008 11:06:59 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m2VB6xAq038888
	for freebsd-fs@FreeBSD.org; Mon, 31 Mar 2008 11:06:59 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 31 Mar 2008 11:06:59 GMT
Message-Id: <200803311106.m2VB6xAq038888@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Mar 2008 11:07:00 -0000

Current FreeBSD problem reports
Critical problems
Serious problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o kern/116170  fs         [panic] Kernel panic when mounting /tmp
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha

4 problems total.

Non-critical problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o bin/118249   fs         mv(1): moving a directory changes its mtime

6 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Mar 31 12:06:59 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1DCED106574C
	for <freebsd-fs@FreeBSD.org>; Mon, 31 Mar 2008 12:06:57 +0000 (UTC)
	(envelope-from bra@fsn.hu)
Received: from people.fsn.hu (people.fsn.hu [195.228.252.137])
	by mx1.freebsd.org (Postfix) with ESMTP id C9C898FC17
	for <freebsd-fs@FreeBSD.org>; Mon, 31 Mar 2008 12:06:56 +0000 (UTC)
	(envelope-from bra@fsn.hu)
Received: from [172.16.129.140] (fw.axelero.hu [195.228.243.120])
	by people.fsn.hu (Postfix) with ESMTP id 32DC6AF422
	for <freebsd-fs@FreeBSD.org>; Mon, 31 Mar 2008 13:51:13 +0200 (CEST)
Message-ID: <47F0D02B.8060504@fsn.hu>
Date: Mon, 31 Mar 2008 13:51:07 +0200
From: Attila Nagy <bra@fsn.hu>
User-Agent: Thunderbird 2.0.0.12 (Windows/20080213)
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: ZFS hangs very often
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Mar 2008 12:06:59 -0000

Hello,

On my desktop machine I use a ZFS pool for everything but the swap and 
the root fs (so /usr, /tmp, and everything else is on ZFS, swap and / is 
on gmirror of two-two partitions).

The first setup was a FreeBSD/i386 7-STABLE, and the pool consisted of 
two partitions from two SATA disks, which were encrypted with GELI 
individually.

After using the system for some weeks (without any higher number of IO 
activity, just working on the machine), the first hang came: I couldn't 
move the mouse under X, but remote sessions were alive and also the 
clock app still counted the time fine. I couldn't log into the machine 
via ssh, the port was open, but I haven't got the banner.
I've done a portupgrade at that time.

After this (and several other) hangs, I decided to remove GELI from the 
equation without success. Then one partition (disk) instead of two. And 
now, I am running amd64 instead of i386 and the problem still persists.

I've attached my notebook to the machine and here is what I have during 
the hang (currently, I am in the process of upgrading some ports and now 
a configure tries to run, but the machine has stopped):
KDB: enter: manual escape to debugger
[thread pid 23 tid 100022 ]
Stopped at      kdb_enter+0x31: leave
db> bt
Tracing pid 23 tid 100022 td 0xffffff000127c350
kdb_enter() at kdb_enter+0x31
scgetc() at scgetc+0x461
sckbdevent() at sckbdevent+0xa4
kbdmux_intr() at kbdmux_intr+0x43
kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20
taskqueue_run() at taskqueue_run+0x9f
ithread_loop() at ithread_loop+0x180
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffb4f04d30, rbp = 0 ---
db> ps
  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
77873 77871 76757     0  S+      piperd   0xffffff0010036ba0 as
77872 77871 76757     0  S+      zfs:(&zi 0xffffff0020ba0298 cc1plus
77871 77870 76757     0  S+      wait     0xffffff0001446000 c++
77870 77321 76757     0  S+      wait     0xffffff000eba7468 sh
77321 76882 76757     0  S+      wait     0xffffff00396448d0 sh
76882 76881 76757     0  S+      wait     0xffffff000eba5468 sh
76881 76757 76757     0  S+      wait     0xffffff00014b28d0 sh
76757 76755 76757     0  Ss+     wait     0xffffff001b369000 make
76755 62725 62725     0  S+      select   0xffffffff80a89d50 script
62725   817 62725     0  S+      wait     0xffffff0001f43000 initial thread
86757 86674 86757  1001  SL+     pfault   0xffffffff80a9a79c ssh
86674 86672 86674  1001  Ss+     wait     0xffffff0001f428d0 bash
86672 86670 86670  1001  S       select   0xffffffff80a89d50 sshd
86670   721 86670     0  Ss      sbwait   0xffffff0001d2015c sshd
62788   802 62788     0  S+      ttyin    0xffffff00013bac10 csh
46310   801 46310     0  ?+                                  csh
  817   800   817     0  S+      pause    0xffffff0001f420c0 csh
  807     1   807     0  Ss+     ttyin    0xffffff000139e810 getty
  806     1   806     0  Ss+     ttyin    0xffffff00013bb410 getty
  805     1   805     0  Ss+     ttyin    0xffffff00013ba810 getty
  804     1   804     0  Ss+     ttyin    0xffffff00013ba010 getty
  803     1   803     0  Ss+     ttyin    0xffffff00013b9410 getty
  802     1   802     0  Ss+     wait     0xffffff00014b08d0 login
  801     1   801     0  Ss+     wait     0xffffff0001567468 login
  800     1   800     0  Ss+     wait     0xffffff00014b0468 login
  737     1   737     0  ?s                                  cron
  731     1   731    25  Ss      pause    0xffffff00015650c0 sendmail
  727     1   727     0  ?s                                  sendmail
  721     1   721     0  Ss      select   0xffffffff80a89d50 sshd
  688   687   687   123  ?                                   ntpd
  687     1   687     0  Ss      select   0xffffffff80a89d50 ntpd
  668     1   668     0  Ss      select   0xffffffff80a89d50 powerd
  559     1   559     0  ?s                                  syslogd
  488     1   488     0  Ss      select   0xffffffff80a89d50 devd
  440     1   440     0  Ss      select   0xffffffff80a89d50 moused
  248     1   248     0  Ss      pause    0xffffff00015640c0 adjkerntz
  175     0     0     0  SL      zfs:(&tq 0xffffff0001583080 [zil_clean]
  174     0     0     0  SL      zfs:(&tq 0xffffff00015831c0 [zil_clean]
  173     0     0     0  SL      zfs:(&tq 0xffffff0001583300 [zil_clean]
  172     0     0     0  SL      zfs:(&tq 0xffffff0001583440 [zil_clean]
  171     0     0     0  SL      zfs:(&tq 0xffffff0001583580 [zil_clean]
  170     0     0     0  SL      zfs:(&tq 0xffffff00015836c0 [zil_clean]
  168     0     0     0  SL      zfs:(&tx 0xffffff000146c590 
[txg_thread_enter]
  167     0     0     0  SL      zfs:(&zi 0xffffff000c717d58 
[txg_thread_enter]
  166     0     0     0  SL      zfs:(&tx 0xffffff00014e0a40 
[txg_thread_enter]
  165     0     0     0  SL      vgeom:io 0xffffff000145c410 
[vdev:worker ad0s1d]
  164     0     0     0  SL      zfs:(&tq 0xffffff000158e300 
[spa_zio_intr_5]
  163     0     0     0  SL      zfs:(&tq 0xffffff000158e300 
[spa_zio_intr_5]
  162     0     0     0  SL      zfs:(&tq 0xffffff000158e1c0 
[spa_zio_issue_5]
  161     0     0     0  SL      zfs:(&tq 0xffffff000158e1c0 
[spa_zio_issue_5]
  160     0     0     0  SL      zfs:(&tq 0xffffff0001227d00 
[spa_zio_intr_4]
  159     0     0     0  SL      zfs:(&tq 0xffffff0001227d00 
[spa_zio_intr_4]
  158     0     0     0  SL      zfs:(&tq 0xffffff0001227bc0 
[spa_zio_issue_4]
  157     0     0     0  SL      zfs:(&tq 0xffffff0001227bc0 
[spa_zio_issue_4]
  156     0     0     0  SL      zfs:(&tq 0xffffff0001227a80 
[spa_zio_intr_3]
  155     0     0     0  SL      zfs:(&tq 0xffffff0001227a80 
[spa_zio_intr_3]
  154     0     0     0  SL      zfs:(&tq 0xffffff0001227940 
[spa_zio_issue_3]
  153     0     0     0  SL      zfs:(&tq 0xffffff0001227940 
[spa_zio_issue_3]
  152     0     0     0  SL      zfs:&vq- 0xffffff00015d0c88 
[spa_zio_intr_2]
  151     0     0     0  SL      vmwait   0xffffffff80a9a79c 
[spa_zio_intr_2]
  150     0     0     0  SL      vmwait   0xffffffff80a9a79c 
[spa_zio_issue_2]
  149     0     0     0  SL      vmwait   0xffffffff80a9a79c 
[spa_zio_issue_2]
  148     0     0     0  SL      zfs:(&tq 0xffffff0001227580 
[spa_zio_intr_1]
  147     0     0     0  SL      zfs:(&tq 0xffffff0001227580 
[spa_zio_intr_1]
  146     0     0     0  SL      zfs:(&tq 0xffffff0001227440 
[spa_zio_issue_1]
  145     0     0     0  SL      zfs:(&tq 0xffffff0001227440 
[spa_zio_issue_1]
  144     0     0     0  SL      zfs:(&tq 0xffffff00012271c0 
[spa_zio_intr_0]
  143     0     0     0  SL      zfs:(&tq 0xffffff00012271c0 
[spa_zio_intr_0]
  142     0     0     0  SL      zfs:(&tq 0xffffff0001227300 
[spa_zio_issue_0]
  141     0     0     0  SL      zfs:(&tq 0xffffff0001227300 
[spa_zio_issue_0]
   87     0     0     0  SL      vmwait   0xffffffff80a9a79c [g_eli[1] 
mirror/swa]
   86     0     0     0  SL      vmwait   0xffffffff80a9a79c [g_eli[0] 
mirror/swa]
   53     0     0     0  SL      sdflush  0xffffffff80a99d88 [softdepflush]
   52     0     0     0  SL      vlruwt   0xffffff0001448000 [vnlru]
   51     0     0     0  SL      zfs:&vq- 0xffffff00015d0c88 [syncer]
   50     0     0     0  SL      psleep   0xffffffff80a8a55c [bufdaemon]
   49     0     0     0  SL      pgzero   0xffffffff80a9b804 [pagezero]
   48     0     0     0  SL      psleep   0xffffffff80a9ab48 [vmdaemon]
   47     0     0     0  SL      wswbuf0  0xffffffff80a9a004 [pagedaemon]
   46     0     0     0  SL      m:w1     0xffffff0001401200 [g_mirror swap]
   45     0     0     0  SL      m:w1     0xffffff00013c3800 [g_mirror root]
   44     0     0     0  SL      zfs:(&ar 0xffffffff80c746b0 
[arc_reclaim_thread]
   43     0     0     0  SL      waiting_ 0xffffffff80a8dc88 [sctp_iterator]
   42     0     0     0  WL                                  [swi0: sio]
   41     0     0     0  WL                                  [irq1: atkbd0]
   40     0     0     0  WL                                  [irq15: ata1]
   39     0     0     0  WL                                  [irq14: ata0]
   38     0     0     0  SL      usbevt   0xffffff000133a420 [usb5]
   37     0     0     0  SL      usbevt   0xffffffff81065420 [usb4]
   36     0     0     0  SL      usbevt   0xffffffff81063420 [usb3]
   35     0     0     0  SL      usbevt   0xffffff000130c420 [usb2]
   34     0     0     0  WL                                  [irq22: ehci0]
   33     0     0     0  SL      usbevt   0xffffffff81061420 [usb1]
   32     0     0     0  WL                                  [irq21: 
pcm0 uhci1+]
   31     0     0     0  SL      usbtsk   0xffffffff80a71028 [usbtask-dr]
   30     0     0     0  SL      usbtsk   0xffffffff80a71000 [usbtask-hc]
   29     0     0     0  SL      usbevt   0xffffffff8105f420 [usb0]
   28     0     0     0  WL                                  [irq20: 
uhci0 uhci+]
   27     0     0     0  SL      -        0xffffff00012ef880 [em0 taskq]
   26     0     0     0  WL                                  [irq9: acpi0]
   25     0     0     0  SL      -        0xffffff0001294580 [kqueue taskq]
   24     0     0     0  WL                                  [swi6: task 
queue]
   23     0     0     0  RL      CPU 1                       [swi6: 
Giant taskq]
   22     0     0     0  SL      -        0xffffff000122c500 [thread taskq]
   21     0     0     0  WL                                  [swi5: +]
   20     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_2]
   19     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_1]
   18     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_0]
   17     0     0     0  WL                                  [swi2: cambio]
    9     0     0     0  SL      ccb_scan 0xffffffff80a3fda0 [xpt_thrd]
   16     0     0     0  SL      -        0xffffffff80a74ea8 [yarrow]
    8     0     0     0  SL      crypto_r 0xffffffff80d293b0 [crypto 
returns]
    7     0     0     0  SL      crypto_w 0xffffffff80d29350 [crypto]
    6     0     0     0  SL      zfs:(&tq 0xffffff0001227080 [system_taskq]
    5     0     0     0  SL      zfs:(&tq 0xffffff0001227080 [system_taskq]
    4     0     0     0  SL      -        0xffffffff80a71838 [g_down]
    3     0     0     0  SL      -        0xffffffff80a71830 [g_up]
    2     0     0     0  SL      -        0xffffffff80a71820 [g_event]
   15     0     0     0  WL                                  [swi1: net]
   14     0     0     0  WL                                  [swi3: vm]
   13     0     0     0  LL     *Giant    0xffffff00015a2be0 [swi4: 
clock sio]
   12     0     0     0  RL      CPU 0                       [idle: cpu0]
   11     0     0     0  RL                                  [idle: cpu1]
    1     0     1     0  SLs     wait     0xffffff000112f8d0 [init]
   10     0     0     0  SL      audit_wo 0xffffffff80a99260 [audit]
    0     0     0     0  SLs     vmwait   0xffffffff80a9a79c [swapper]
db> trace 77872
Tracing pid 77872 tid 100157 td 0xffffff000eb9c350
sched_switch() at sched_switch+0x1fe
mi_switch() at mi_switch+0x189
sleepq_wait() at sleepq_wait+0x3b
_cv_wait() at _cv_wait+0xfe
zio_wait() at zio_wait+0x5f
dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x1f6
dmu_buf_hold_array() at dmu_buf_hold_array+0x62
dmu_read_uio() at dmu_read_uio+0x3f
zfs_freebsd_read() at zfs_freebsd_read+0x535
vn_read() at vn_read+0x1ca
dofileread() at dofileread+0xa1
kern_readv() at kern_readv+0x4c
read() at read+0x54
syscall() at syscall+0x254
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (3, FreeBSD ELF64, read), rip = 0x8bd74c, rsp = 
0x7fffffffe2c8, rbp = 0 ---
db> trace 51
Tracing pid 51 tid 100050 td 0xffffff00014246a0
sched_switch() at sched_switch+0x1fe
mi_switch() at mi_switch+0x189
sleepq_wait() at sleepq_wait+0x3b
_sx_xlock_hard() at _sx_xlock_hard+0x1ee
_sx_xlock() at _sx_xlock+0x4e
vdev_queue_io() at vdev_queue_io+0x74
vdev_geom_io_start() at vdev_geom_io_start+0x4a
vdev_mirror_io_start() at vdev_mirror_io_start+0x1b0
zil_lwb_write_start() at zil_lwb_write_start+0x2f1
zil_commit_writer() at zil_commit_writer+0x1c4
zil_commit() at zil_commit+0xb8
zfs_sync() at zfs_sync+0x9a
sync_fsync() at sync_fsync+0x1ac
sched_sync() at sched_sync+0x63f
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffb502fd30, rbp = 0 ---

Any ideas about this?

From owner-freebsd-fs@FreeBSD.ORG  Mon Mar 31 13:22:56 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 771851065672
	for <freebsd-fs@FreeBSD.org>; Mon, 31 Mar 2008 13:22:56 +0000 (UTC)
	(envelope-from gary.jennejohn@freenet.de)
Received: from mout4.freenet.de (mout4.freenet.de [IPv6:2001:748:100:40::2:6])
	by mx1.freebsd.org (Postfix) with ESMTP id 7BB608FC25
	for <freebsd-fs@FreeBSD.org>; Mon, 31 Mar 2008 13:22:55 +0000 (UTC)
	(envelope-from gary.jennejohn@freenet.de)
Received: from [195.4.92.14] (helo=4.mx.freenet.de)
	by mout4.freenet.de with esmtpa (Exim 4.69)
	(envelope-from <gary.jennejohn@freenet.de>)
	id 1JgJyP-0002Ry-GI; Mon, 31 Mar 2008 15:22:54 +0200
Received: from x1b6f.x.pppool.de ([89.59.27.111]:35965
	helo=peedub.jennejohn.org)
	by 4.mx.freenet.de with esmtpa (ID gary.jennejohn@freenet.de) (port 25)
	(Exim 4.69 #12) id 1JgJyO-0007Cv-Sn; Mon, 31 Mar 2008 15:22:53 +0200
Date: Mon, 31 Mar 2008 15:22:51 +0200
From: Gary Jennejohn <gary.jennejohn@freenet.de>
To: Attila Nagy <bra@fsn.hu>
Message-ID: <20080331152251.62526181@peedub.jennejohn.org>
In-Reply-To: <47F0D02B.8060504@fsn.hu>
References: <47F0D02B.8060504@fsn.hu>
X-Mailer: Claws Mail 3.3.1 (GTK+ 2.10.14; amd64-portbld-freebsd8.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
Subject: Re: ZFS hangs very often
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: gary.jennejohn@freenet.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Mar 2008 13:22:56 -0000

On Mon, 31 Mar 2008 13:51:07 +0200
Attila Nagy <bra@fsn.hu> wrote:

> Hello,
> 
> On my desktop machine I use a ZFS pool for everything but the swap and 
> the root fs (so /usr, /tmp, and everything else is on ZFS, swap and / is 
> on gmirror of two-two partitions).
> 
> The first setup was a FreeBSD/i386 7-STABLE, and the pool consisted of 
> two partitions from two SATA disks, which were encrypted with GELI 
> individually.
> 
> After using the system for some weeks (without any higher number of IO 
> activity, just working on the machine), the first hang came: I couldn't 
> move the mouse under X, but remote sessions were alive and also the 
> clock app still counted the time fine. I couldn't log into the machine 
> via ssh, the port was open, but I haven't got the banner.
> I've done a portupgrade at that time.
> 
> After this (and several other) hangs, I decided to remove GELI from the 
> equation without success. Then one partition (disk) instead of two. And 
> now, I am running amd64 instead of i386 and the problem still persists.
> 
> I've attached my notebook to the machine and here is what I have during 
> the hang (currently, I am in the process of upgrading some ports and now 
> a configure tries to run, but the machine has stopped):
> KDB: enter: manual escape to debugger
> [thread pid 23 tid 100022 ]
> Stopped at      kdb_enter+0x31: leave
> db> bt
> Tracing pid 23 tid 100022 td 0xffffff000127c350
> kdb_enter() at kdb_enter+0x31
> scgetc() at scgetc+0x461
> sckbdevent() at sckbdevent+0xa4
> kbdmux_intr() at kbdmux_intr+0x43
> kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20
> taskqueue_run() at taskqueue_run+0x9f
> ithread_loop() at ithread_loop+0x180
> fork_exit() at fork_exit+0x11f
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xffffffffb4f04d30, rbp = 0 ---
> db> ps
>   pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
> 77873 77871 76757     0  S+      piperd   0xffffff0010036ba0 as
> 77872 77871 76757     0  S+      zfs:(&zi 0xffffff0020ba0298 cc1plus
> 77871 77870 76757     0  S+      wait     0xffffff0001446000 c++
> 77870 77321 76757     0  S+      wait     0xffffff000eba7468 sh
> 77321 76882 76757     0  S+      wait     0xffffff00396448d0 sh
> 76882 76881 76757     0  S+      wait     0xffffff000eba5468 sh
> 76881 76757 76757     0  S+      wait     0xffffff00014b28d0 sh
> 76757 76755 76757     0  Ss+     wait     0xffffff001b369000 make
> 76755 62725 62725     0  S+      select   0xffffffff80a89d50 script
> 62725   817 62725     0  S+      wait     0xffffff0001f43000 initial thread
> 86757 86674 86757  1001  SL+     pfault   0xffffffff80a9a79c ssh
> 86674 86672 86674  1001  Ss+     wait     0xffffff0001f428d0 bash
> 86672 86670 86670  1001  S       select   0xffffffff80a89d50 sshd
> 86670   721 86670     0  Ss      sbwait   0xffffff0001d2015c sshd
> 62788   802 62788     0  S+      ttyin    0xffffff00013bac10 csh
> 46310   801 46310     0  ?+                                  csh
>   817   800   817     0  S+      pause    0xffffff0001f420c0 csh
>   807     1   807     0  Ss+     ttyin    0xffffff000139e810 getty
>   806     1   806     0  Ss+     ttyin    0xffffff00013bb410 getty
>   805     1   805     0  Ss+     ttyin    0xffffff00013ba810 getty
>   804     1   804     0  Ss+     ttyin    0xffffff00013ba010 getty
>   803     1   803     0  Ss+     ttyin    0xffffff00013b9410 getty
>   802     1   802     0  Ss+     wait     0xffffff00014b08d0 login
>   801     1   801     0  Ss+     wait     0xffffff0001567468 login
>   800     1   800     0  Ss+     wait     0xffffff00014b0468 login
>   737     1   737     0  ?s                                  cron
>   731     1   731    25  Ss      pause    0xffffff00015650c0 sendmail
>   727     1   727     0  ?s                                  sendmail
>   721     1   721     0  Ss      select   0xffffffff80a89d50 sshd
>   688   687   687   123  ?                                   ntpd
>   687     1   687     0  Ss      select   0xffffffff80a89d50 ntpd
>   668     1   668     0  Ss      select   0xffffffff80a89d50 powerd
>   559     1   559     0  ?s                                  syslogd
>   488     1   488     0  Ss      select   0xffffffff80a89d50 devd
>   440     1   440     0  Ss      select   0xffffffff80a89d50 moused
>   248     1   248     0  Ss      pause    0xffffff00015640c0 adjkerntz
>   175     0     0     0  SL      zfs:(&tq 0xffffff0001583080 [zil_clean]
>   174     0     0     0  SL      zfs:(&tq 0xffffff00015831c0 [zil_clean]
>   173     0     0     0  SL      zfs:(&tq 0xffffff0001583300 [zil_clean]
>   172     0     0     0  SL      zfs:(&tq 0xffffff0001583440 [zil_clean]
>   171     0     0     0  SL      zfs:(&tq 0xffffff0001583580 [zil_clean]
>   170     0     0     0  SL      zfs:(&tq 0xffffff00015836c0 [zil_clean]
>   168     0     0     0  SL      zfs:(&tx 0xffffff000146c590 
> [txg_thread_enter]
>   167     0     0     0  SL      zfs:(&zi 0xffffff000c717d58 
> [txg_thread_enter]
>   166     0     0     0  SL      zfs:(&tx 0xffffff00014e0a40 
> [txg_thread_enter]
>   165     0     0     0  SL      vgeom:io 0xffffff000145c410 
> [vdev:worker ad0s1d]
>   164     0     0     0  SL      zfs:(&tq 0xffffff000158e300 
> [spa_zio_intr_5]
>   163     0     0     0  SL      zfs:(&tq 0xffffff000158e300 
> [spa_zio_intr_5]
>   162     0     0     0  SL      zfs:(&tq 0xffffff000158e1c0 
> [spa_zio_issue_5]
>   161     0     0     0  SL      zfs:(&tq 0xffffff000158e1c0 
> [spa_zio_issue_5]
>   160     0     0     0  SL      zfs:(&tq 0xffffff0001227d00 
> [spa_zio_intr_4]
>   159     0     0     0  SL      zfs:(&tq 0xffffff0001227d00 
> [spa_zio_intr_4]
>   158     0     0     0  SL      zfs:(&tq 0xffffff0001227bc0 
> [spa_zio_issue_4]
>   157     0     0     0  SL      zfs:(&tq 0xffffff0001227bc0 
> [spa_zio_issue_4]
>   156     0     0     0  SL      zfs:(&tq 0xffffff0001227a80 
> [spa_zio_intr_3]
>   155     0     0     0  SL      zfs:(&tq 0xffffff0001227a80 
> [spa_zio_intr_3]
>   154     0     0     0  SL      zfs:(&tq 0xffffff0001227940 
> [spa_zio_issue_3]
>   153     0     0     0  SL      zfs:(&tq 0xffffff0001227940 
> [spa_zio_issue_3]
>   152     0     0     0  SL      zfs:&vq- 0xffffff00015d0c88 
> [spa_zio_intr_2]
>   151     0     0     0  SL      vmwait   0xffffffff80a9a79c 
> [spa_zio_intr_2]
>   150     0     0     0  SL      vmwait   0xffffffff80a9a79c 
> [spa_zio_issue_2]
>   149     0     0     0  SL      vmwait   0xffffffff80a9a79c 
> [spa_zio_issue_2]
>   148     0     0     0  SL      zfs:(&tq 0xffffff0001227580 
> [spa_zio_intr_1]
>   147     0     0     0  SL      zfs:(&tq 0xffffff0001227580 
> [spa_zio_intr_1]
>   146     0     0     0  SL      zfs:(&tq 0xffffff0001227440 
> [spa_zio_issue_1]
>   145     0     0     0  SL      zfs:(&tq 0xffffff0001227440 
> [spa_zio_issue_1]
>   144     0     0     0  SL      zfs:(&tq 0xffffff00012271c0 
> [spa_zio_intr_0]
>   143     0     0     0  SL      zfs:(&tq 0xffffff00012271c0 
> [spa_zio_intr_0]
>   142     0     0     0  SL      zfs:(&tq 0xffffff0001227300 
> [spa_zio_issue_0]
>   141     0     0     0  SL      zfs:(&tq 0xffffff0001227300 
> [spa_zio_issue_0]
>    87     0     0     0  SL      vmwait   0xffffffff80a9a79c [g_eli[1] 
> mirror/swa]
>    86     0     0     0  SL      vmwait   0xffffffff80a9a79c [g_eli[0] 
> mirror/swa]
>    53     0     0     0  SL      sdflush  0xffffffff80a99d88 [softdepflush]
>    52     0     0     0  SL      vlruwt   0xffffff0001448000 [vnlru]
>    51     0     0     0  SL      zfs:&vq- 0xffffff00015d0c88 [syncer]
>    50     0     0     0  SL      psleep   0xffffffff80a8a55c [bufdaemon]
>    49     0     0     0  SL      pgzero   0xffffffff80a9b804 [pagezero]
>    48     0     0     0  SL      psleep   0xffffffff80a9ab48 [vmdaemon]
>    47     0     0     0  SL      wswbuf0  0xffffffff80a9a004 [pagedaemon]
>    46     0     0     0  SL      m:w1     0xffffff0001401200 [g_mirror swap]
>    45     0     0     0  SL      m:w1     0xffffff00013c3800 [g_mirror root]
>    44     0     0     0  SL      zfs:(&ar 0xffffffff80c746b0 
> [arc_reclaim_thread]
>    43     0     0     0  SL      waiting_ 0xffffffff80a8dc88 [sctp_iterator]
>    42     0     0     0  WL                                  [swi0: sio]
>    41     0     0     0  WL                                  [irq1: atkbd0]
>    40     0     0     0  WL                                  [irq15: ata1]
>    39     0     0     0  WL                                  [irq14: ata0]
>    38     0     0     0  SL      usbevt   0xffffff000133a420 [usb5]
>    37     0     0     0  SL      usbevt   0xffffffff81065420 [usb4]
>    36     0     0     0  SL      usbevt   0xffffffff81063420 [usb3]
>    35     0     0     0  SL      usbevt   0xffffff000130c420 [usb2]
>    34     0     0     0  WL                                  [irq22: ehci0]
>    33     0     0     0  SL      usbevt   0xffffffff81061420 [usb1]
>    32     0     0     0  WL                                  [irq21: 
> pcm0 uhci1+]
>    31     0     0     0  SL      usbtsk   0xffffffff80a71028 [usbtask-dr]
>    30     0     0     0  SL      usbtsk   0xffffffff80a71000 [usbtask-hc]
>    29     0     0     0  SL      usbevt   0xffffffff8105f420 [usb0]
>    28     0     0     0  WL                                  [irq20: 
> uhci0 uhci+]
>    27     0     0     0  SL      -        0xffffff00012ef880 [em0 taskq]
>    26     0     0     0  WL                                  [irq9: acpi0]
>    25     0     0     0  SL      -        0xffffff0001294580 [kqueue taskq]
>    24     0     0     0  WL                                  [swi6: task 
> queue]
>    23     0     0     0  RL      CPU 1                       [swi6: 
> Giant taskq]
>    22     0     0     0  SL      -        0xffffff000122c500 [thread taskq]
>    21     0     0     0  WL                                  [swi5: +]
>    20     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_2]
>    19     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_1]
>    18     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_0]
>    17     0     0     0  WL                                  [swi2: cambio]
>     9     0     0     0  SL      ccb_scan 0xffffffff80a3fda0 [xpt_thrd]
>    16     0     0     0  SL      -        0xffffffff80a74ea8 [yarrow]
>     8     0     0     0  SL      crypto_r 0xffffffff80d293b0 [crypto 
> returns]
>     7     0     0     0  SL      crypto_w 0xffffffff80d29350 [crypto]
>     6     0     0     0  SL      zfs:(&tq 0xffffff0001227080 [system_taskq]
>     5     0     0     0  SL      zfs:(&tq 0xffffff0001227080 [system_taskq]
>     4     0     0     0  SL      -        0xffffffff80a71838 [g_down]
>     3     0     0     0  SL      -        0xffffffff80a71830 [g_up]
>     2     0     0     0  SL      -        0xffffffff80a71820 [g_event]
>    15     0     0     0  WL                                  [swi1: net]
>    14     0     0     0  WL                                  [swi3: vm]
>    13     0     0     0  LL     *Giant    0xffffff00015a2be0 [swi4: 
> clock sio]
>    12     0     0     0  RL      CPU 0                       [idle: cpu0]
>    11     0     0     0  RL                                  [idle: cpu1]
>     1     0     1     0  SLs     wait     0xffffff000112f8d0 [init]
>    10     0     0     0  SL      audit_wo 0xffffffff80a99260 [audit]
>     0     0     0     0  SLs     vmwait   0xffffffff80a9a79c [swapper]
> db> trace 77872
> Tracing pid 77872 tid 100157 td 0xffffff000eb9c350
> sched_switch() at sched_switch+0x1fe
> mi_switch() at mi_switch+0x189
> sleepq_wait() at sleepq_wait+0x3b
> _cv_wait() at _cv_wait+0xfe
> zio_wait() at zio_wait+0x5f
> dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x1f6
> dmu_buf_hold_array() at dmu_buf_hold_array+0x62
> dmu_read_uio() at dmu_read_uio+0x3f
> zfs_freebsd_read() at zfs_freebsd_read+0x535
> vn_read() at vn_read+0x1ca
> dofileread() at dofileread+0xa1
> kern_readv() at kern_readv+0x4c
> read() at read+0x54
> syscall() at syscall+0x254
> Xfast_syscall() at Xfast_syscall+0xab
> --- syscall (3, FreeBSD ELF64, read), rip = 0x8bd74c, rsp = 
> 0x7fffffffe2c8, rbp = 0 ---
> db> trace 51
> Tracing pid 51 tid 100050 td 0xffffff00014246a0
> sched_switch() at sched_switch+0x1fe
> mi_switch() at mi_switch+0x189
> sleepq_wait() at sleepq_wait+0x3b
> _sx_xlock_hard() at _sx_xlock_hard+0x1ee
> _sx_xlock() at _sx_xlock+0x4e
> vdev_queue_io() at vdev_queue_io+0x74
> vdev_geom_io_start() at vdev_geom_io_start+0x4a
> vdev_mirror_io_start() at vdev_mirror_io_start+0x1b0
> zil_lwb_write_start() at zil_lwb_write_start+0x2f1
> zil_commit_writer() at zil_commit_writer+0x1c4
> zil_commit() at zil_commit+0xb8
> zfs_sync() at zfs_sync+0x9a
> sync_fsync() at sync_fsync+0x1ac
> sched_sync() at sched_sync+0x63f
> fork_exit() at fork_exit+0x11f
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xffffffffb502fd30, rbp = 0 ---
> 
> Any ideas about this?
>

I quote the entire email to preserve context, although that seems rather
excessive.

I can only say that I've observed hangs like this at the same location
in the kernel as in the first stack trace (_cv_wait -> sleepq_wait).

Since I don't have important file systems like /usr, /var etc. under ZFS
I've always been able to recover by
a) raising the priority of the blocked process with nice
b) then killing the process

Strangely enough I've always been able to access the file system on which
the process was blocked (e.g. ls) from a different terminal.  So the hang seems
to be limited to only the one process and not to be a symptom of ZFS itself
wedging.  Or maybe it's just that my ls was accessing different parts of the
filesystem not covered by the CV?  FIIK.

Otherwise I have no idea what's going on.

I mentioned this some time ago (months?) on -current but never got any
response.  I didn't have any nice trace, though.

---
Gary Jennejohn

From owner-freebsd-fs@FreeBSD.ORG  Mon Mar 31 13:58:01 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5B988106564A
	for <freebsd-fs@FreeBSD.org>; Mon, 31 Mar 2008 13:58:01 +0000 (UTC)
	(envelope-from bra@fsn.hu)
Received: from people.fsn.hu (people.fsn.hu [195.228.252.137])
	by mx1.freebsd.org (Postfix) with ESMTP id 720038FC33
	for <freebsd-fs@FreeBSD.org>; Mon, 31 Mar 2008 13:57:59 +0000 (UTC)
	(envelope-from bra@fsn.hu)
Received: from [172.16.151.53] (fw.axelero.hu [195.228.243.120])
	by people.fsn.hu (Postfix) with ESMTP id B4DE3AD834;
	Mon, 31 Mar 2008 15:57:47 +0200 (CEST)
Message-ID: <47F0EDD6.8060402@fsn.hu>
Date: Mon, 31 Mar 2008 15:57:42 +0200
From: Attila Nagy <bra@fsn.hu>
User-Agent: Thunderbird 2.0.0.12 (Windows/20080213)
MIME-Version: 1.0
To: gary.jennejohn@freenet.de
References: <47F0D02B.8060504@fsn.hu>
	<20080331152251.62526181@peedub.jennejohn.org>
In-Reply-To: <20080331152251.62526181@peedub.jennejohn.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
Subject: Re: ZFS hangs very often
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Mar 2008 13:58:01 -0000

On 2008.03.31. 15:22, Gary Jennejohn wrote:
> On Mon, 31 Mar 2008 13:51:07 +0200
> Attila Nagy <bra@fsn.hu> wrote:
>
>   
>> Hello,
>>
>> On my desktop machine I use a ZFS pool for everything but the swap and 
>> the root fs (so /usr, /tmp, and everything else is on ZFS, swap and / is 
>> on gmirror of two-two partitions).
>>
>> The first setup was a FreeBSD/i386 7-STABLE, and the pool consisted of 
>> two partitions from two SATA disks, which were encrypted with GELI 
>> individually.
>>
>> After using the system for some weeks (without any higher number of IO 
>> activity, just working on the machine), the first hang came: I couldn't 
>> move the mouse under X, but remote sessions were alive and also the 
>> clock app still counted the time fine. I couldn't log into the machine 
>> via ssh, the port was open, but I haven't got the banner.
>> I've done a portupgrade at that time.
>>
>> After this (and several other) hangs, I decided to remove GELI from the 
>> equation without success. Then one partition (disk) instead of two. And 
>> now, I am running amd64 instead of i386 and the problem still persists.
>>
>> I've attached my notebook to the machine and here is what I have during 
>> the hang (currently, I am in the process of upgrading some ports and now 
>> a configure tries to run, but the machine has stopped):
>> KDB: enter: manual escape to debugger
>> [thread pid 23 tid 100022 ]
>> Stopped at      kdb_enter+0x31: leave
>> db> bt
>> Tracing pid 23 tid 100022 td 0xffffff000127c350
>> kdb_enter() at kdb_enter+0x31
>> scgetc() at scgetc+0x461
>> sckbdevent() at sckbdevent+0xa4
>> kbdmux_intr() at kbdmux_intr+0x43
>> kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20
>> taskqueue_run() at taskqueue_run+0x9f
>> ithread_loop() at ithread_loop+0x180
>> fork_exit() at fork_exit+0x11f
>> fork_trampoline() at fork_trampoline+0xe
>> --- trap 0, rip = 0, rsp = 0xffffffffb4f04d30, rbp = 0 ---
>> db> ps
>>   pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
>> 77873 77871 76757     0  S+      piperd   0xffffff0010036ba0 as
>> 77872 77871 76757     0  S+      zfs:(&zi 0xffffff0020ba0298 cc1plus
>> 77871 77870 76757     0  S+      wait     0xffffff0001446000 c++
>> 77870 77321 76757     0  S+      wait     0xffffff000eba7468 sh
>> 77321 76882 76757     0  S+      wait     0xffffff00396448d0 sh
>> 76882 76881 76757     0  S+      wait     0xffffff000eba5468 sh
>> 76881 76757 76757     0  S+      wait     0xffffff00014b28d0 sh
>> 76757 76755 76757     0  Ss+     wait     0xffffff001b369000 make
>> 76755 62725 62725     0  S+      select   0xffffffff80a89d50 script
>> 62725   817 62725     0  S+      wait     0xffffff0001f43000 initial thread
>> 86757 86674 86757  1001  SL+     pfault   0xffffffff80a9a79c ssh
>> 86674 86672 86674  1001  Ss+     wait     0xffffff0001f428d0 bash
>> 86672 86670 86670  1001  S       select   0xffffffff80a89d50 sshd
>> 86670   721 86670     0  Ss      sbwait   0xffffff0001d2015c sshd
>> 62788   802 62788     0  S+      ttyin    0xffffff00013bac10 csh
>> 46310   801 46310     0  ?+                                  csh
>>   817   800   817     0  S+      pause    0xffffff0001f420c0 csh
>>   807     1   807     0  Ss+     ttyin    0xffffff000139e810 getty
>>   806     1   806     0  Ss+     ttyin    0xffffff00013bb410 getty
>>   805     1   805     0  Ss+     ttyin    0xffffff00013ba810 getty
>>   804     1   804     0  Ss+     ttyin    0xffffff00013ba010 getty
>>   803     1   803     0  Ss+     ttyin    0xffffff00013b9410 getty
>>   802     1   802     0  Ss+     wait     0xffffff00014b08d0 login
>>   801     1   801     0  Ss+     wait     0xffffff0001567468 login
>>   800     1   800     0  Ss+     wait     0xffffff00014b0468 login
>>   737     1   737     0  ?s                                  cron
>>   731     1   731    25  Ss      pause    0xffffff00015650c0 sendmail
>>   727     1   727     0  ?s                                  sendmail
>>   721     1   721     0  Ss      select   0xffffffff80a89d50 sshd
>>   688   687   687   123  ?                                   ntpd
>>   687     1   687     0  Ss      select   0xffffffff80a89d50 ntpd
>>   668     1   668     0  Ss      select   0xffffffff80a89d50 powerd
>>   559     1   559     0  ?s                                  syslogd
>>   488     1   488     0  Ss      select   0xffffffff80a89d50 devd
>>   440     1   440     0  Ss      select   0xffffffff80a89d50 moused
>>   248     1   248     0  Ss      pause    0xffffff00015640c0 adjkerntz
>>   175     0     0     0  SL      zfs:(&tq 0xffffff0001583080 [zil_clean]
>>   174     0     0     0  SL      zfs:(&tq 0xffffff00015831c0 [zil_clean]
>>   173     0     0     0  SL      zfs:(&tq 0xffffff0001583300 [zil_clean]
>>   172     0     0     0  SL      zfs:(&tq 0xffffff0001583440 [zil_clean]
>>   171     0     0     0  SL      zfs:(&tq 0xffffff0001583580 [zil_clean]
>>   170     0     0     0  SL      zfs:(&tq 0xffffff00015836c0 [zil_clean]
>>   168     0     0     0  SL      zfs:(&tx 0xffffff000146c590 
>> [txg_thread_enter]
>>   167     0     0     0  SL      zfs:(&zi 0xffffff000c717d58 
>> [txg_thread_enter]
>>   166     0     0     0  SL      zfs:(&tx 0xffffff00014e0a40 
>> [txg_thread_enter]
>>   165     0     0     0  SL      vgeom:io 0xffffff000145c410 
>> [vdev:worker ad0s1d]
>>   164     0     0     0  SL      zfs:(&tq 0xffffff000158e300 
>> [spa_zio_intr_5]
>>   163     0     0     0  SL      zfs:(&tq 0xffffff000158e300 
>> [spa_zio_intr_5]
>>   162     0     0     0  SL      zfs:(&tq 0xffffff000158e1c0 
>> [spa_zio_issue_5]
>>   161     0     0     0  SL      zfs:(&tq 0xffffff000158e1c0 
>> [spa_zio_issue_5]
>>   160     0     0     0  SL      zfs:(&tq 0xffffff0001227d00 
>> [spa_zio_intr_4]
>>   159     0     0     0  SL      zfs:(&tq 0xffffff0001227d00 
>> [spa_zio_intr_4]
>>   158     0     0     0  SL      zfs:(&tq 0xffffff0001227bc0 
>> [spa_zio_issue_4]
>>   157     0     0     0  SL      zfs:(&tq 0xffffff0001227bc0 
>> [spa_zio_issue_4]
>>   156     0     0     0  SL      zfs:(&tq 0xffffff0001227a80 
>> [spa_zio_intr_3]
>>   155     0     0     0  SL      zfs:(&tq 0xffffff0001227a80 
>> [spa_zio_intr_3]
>>   154     0     0     0  SL      zfs:(&tq 0xffffff0001227940 
>> [spa_zio_issue_3]
>>   153     0     0     0  SL      zfs:(&tq 0xffffff0001227940 
>> [spa_zio_issue_3]
>>   152     0     0     0  SL      zfs:&vq- 0xffffff00015d0c88 
>> [spa_zio_intr_2]
>>   151     0     0     0  SL      vmwait   0xffffffff80a9a79c 
>> [spa_zio_intr_2]
>>   150     0     0     0  SL      vmwait   0xffffffff80a9a79c 
>> [spa_zio_issue_2]
>>   149     0     0     0  SL      vmwait   0xffffffff80a9a79c 
>> [spa_zio_issue_2]
>>   148     0     0     0  SL      zfs:(&tq 0xffffff0001227580 
>> [spa_zio_intr_1]
>>   147     0     0     0  SL      zfs:(&tq 0xffffff0001227580 
>> [spa_zio_intr_1]
>>   146     0     0     0  SL      zfs:(&tq 0xffffff0001227440 
>> [spa_zio_issue_1]
>>   145     0     0     0  SL      zfs:(&tq 0xffffff0001227440 
>> [spa_zio_issue_1]
>>   144     0     0     0  SL      zfs:(&tq 0xffffff00012271c0 
>> [spa_zio_intr_0]
>>   143     0     0     0  SL      zfs:(&tq 0xffffff00012271c0 
>> [spa_zio_intr_0]
>>   142     0     0     0  SL      zfs:(&tq 0xffffff0001227300 
>> [spa_zio_issue_0]
>>   141     0     0     0  SL      zfs:(&tq 0xffffff0001227300 
>> [spa_zio_issue_0]
>>    87     0     0     0  SL      vmwait   0xffffffff80a9a79c [g_eli[1] 
>> mirror/swa]
>>    86     0     0     0  SL      vmwait   0xffffffff80a9a79c [g_eli[0] 
>> mirror/swa]
>>    53     0     0     0  SL      sdflush  0xffffffff80a99d88 [softdepflush]
>>    52     0     0     0  SL      vlruwt   0xffffff0001448000 [vnlru]
>>    51     0     0     0  SL      zfs:&vq- 0xffffff00015d0c88 [syncer]
>>    50     0     0     0  SL      psleep   0xffffffff80a8a55c [bufdaemon]
>>    49     0     0     0  SL      pgzero   0xffffffff80a9b804 [pagezero]
>>    48     0     0     0  SL      psleep   0xffffffff80a9ab48 [vmdaemon]
>>    47     0     0     0  SL      wswbuf0  0xffffffff80a9a004 [pagedaemon]
>>    46     0     0     0  SL      m:w1     0xffffff0001401200 [g_mirror swap]
>>    45     0     0     0  SL      m:w1     0xffffff00013c3800 [g_mirror root]
>>    44     0     0     0  SL      zfs:(&ar 0xffffffff80c746b0 
>> [arc_reclaim_thread]
>>    43     0     0     0  SL      waiting_ 0xffffffff80a8dc88 [sctp_iterator]
>>    42     0     0     0  WL                                  [swi0: sio]
>>    41     0     0     0  WL                                  [irq1: atkbd0]
>>    40     0     0     0  WL                                  [irq15: ata1]
>>    39     0     0     0  WL                                  [irq14: ata0]
>>    38     0     0     0  SL      usbevt   0xffffff000133a420 [usb5]
>>    37     0     0     0  SL      usbevt   0xffffffff81065420 [usb4]
>>    36     0     0     0  SL      usbevt   0xffffffff81063420 [usb3]
>>    35     0     0     0  SL      usbevt   0xffffff000130c420 [usb2]
>>    34     0     0     0  WL                                  [irq22: ehci0]
>>    33     0     0     0  SL      usbevt   0xffffffff81061420 [usb1]
>>    32     0     0     0  WL                                  [irq21: 
>> pcm0 uhci1+]
>>    31     0     0     0  SL      usbtsk   0xffffffff80a71028 [usbtask-dr]
>>    30     0     0     0  SL      usbtsk   0xffffffff80a71000 [usbtask-hc]
>>    29     0     0     0  SL      usbevt   0xffffffff8105f420 [usb0]
>>    28     0     0     0  WL                                  [irq20: 
>> uhci0 uhci+]
>>    27     0     0     0  SL      -        0xffffff00012ef880 [em0 taskq]
>>    26     0     0     0  WL                                  [irq9: acpi0]
>>    25     0     0     0  SL      -        0xffffff0001294580 [kqueue taskq]
>>    24     0     0     0  WL                                  [swi6: task 
>> queue]
>>    23     0     0     0  RL      CPU 1                       [swi6: 
>> Giant taskq]
>>    22     0     0     0  SL      -        0xffffff000122c500 [thread taskq]
>>    21     0     0     0  WL                                  [swi5: +]
>>    20     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_2]
>>    19     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_1]
>>    18     0     0     0  SL      -        0xffffff000122ca80 [acpi_task_0]
>>    17     0     0     0  WL                                  [swi2: cambio]
>>     9     0     0     0  SL      ccb_scan 0xffffffff80a3fda0 [xpt_thrd]
>>    16     0     0     0  SL      -        0xffffffff80a74ea8 [yarrow]
>>     8     0     0     0  SL      crypto_r 0xffffffff80d293b0 [crypto 
>> returns]
>>     7     0     0     0  SL      crypto_w 0xffffffff80d29350 [crypto]
>>     6     0     0     0  SL      zfs:(&tq 0xffffff0001227080 [system_taskq]
>>     5     0     0     0  SL      zfs:(&tq 0xffffff0001227080 [system_taskq]
>>     4     0     0     0  SL      -        0xffffffff80a71838 [g_down]
>>     3     0     0     0  SL      -        0xffffffff80a71830 [g_up]
>>     2     0     0     0  SL      -        0xffffffff80a71820 [g_event]
>>    15     0     0     0  WL                                  [swi1: net]
>>    14     0     0     0  WL                                  [swi3: vm]
>>    13     0     0     0  LL     *Giant    0xffffff00015a2be0 [swi4: 
>> clock sio]
>>    12     0     0     0  RL      CPU 0                       [idle: cpu0]
>>    11     0     0     0  RL                                  [idle: cpu1]
>>     1     0     1     0  SLs     wait     0xffffff000112f8d0 [init]
>>    10     0     0     0  SL      audit_wo 0xffffffff80a99260 [audit]
>>     0     0     0     0  SLs     vmwait   0xffffffff80a9a79c [swapper]
>> db> trace 77872
>> Tracing pid 77872 tid 100157 td 0xffffff000eb9c350
>> sched_switch() at sched_switch+0x1fe
>> mi_switch() at mi_switch+0x189
>> sleepq_wait() at sleepq_wait+0x3b
>> _cv_wait() at _cv_wait+0xfe
>> zio_wait() at zio_wait+0x5f
>> dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x1f6
>> dmu_buf_hold_array() at dmu_buf_hold_array+0x62
>> dmu_read_uio() at dmu_read_uio+0x3f
>> zfs_freebsd_read() at zfs_freebsd_read+0x535
>> vn_read() at vn_read+0x1ca
>> dofileread() at dofileread+0xa1
>> kern_readv() at kern_readv+0x4c
>> read() at read+0x54
>> syscall() at syscall+0x254
>> Xfast_syscall() at Xfast_syscall+0xab
>> --- syscall (3, FreeBSD ELF64, read), rip = 0x8bd74c, rsp = 
>> 0x7fffffffe2c8, rbp = 0 ---
>> db> trace 51
>> Tracing pid 51 tid 100050 td 0xffffff00014246a0
>> sched_switch() at sched_switch+0x1fe
>> mi_switch() at mi_switch+0x189
>> sleepq_wait() at sleepq_wait+0x3b
>> _sx_xlock_hard() at _sx_xlock_hard+0x1ee
>> _sx_xlock() at _sx_xlock+0x4e
>> vdev_queue_io() at vdev_queue_io+0x74
>> vdev_geom_io_start() at vdev_geom_io_start+0x4a
>> vdev_mirror_io_start() at vdev_mirror_io_start+0x1b0
>> zil_lwb_write_start() at zil_lwb_write_start+0x2f1
>> zil_commit_writer() at zil_commit_writer+0x1c4
>> zil_commit() at zil_commit+0xb8
>> zfs_sync() at zfs_sync+0x9a
>> sync_fsync() at sync_fsync+0x1ac
>> sched_sync() at sched_sync+0x63f
>> fork_exit() at fork_exit+0x11f
>> fork_trampoline() at fork_trampoline+0xe
>> --- trap 0, rip = 0, rsp = 0xffffffffb502fd30, rbp = 0 ---
>>
>> Any ideas about this?
>>
>>     
>
> I quote the entire email to preserve context, although that seems rather
> excessive.
>
> I can only say that I've observed hangs like this at the same location
> in the kernel as in the first stack trace (_cv_wait -> sleepq_wait).
>
> Since I don't have important file systems like /usr, /var etc. under ZFS
> I've always been able to recover by
> a) raising the priority of the blocked process with nice
> b) then killing the process
>
> Strangely enough I've always been able to access the file system on which
> the process was blocked (e.g. ls) from a different terminal.  So the hang seems
> to be limited to only the one process and not to be a symptom of ZFS itself
> wedging.  Or maybe it's just that my ls was accessing different parts of the
> filesystem not covered by the CV?  FIIK.
>
> Otherwise I have no idea what's going on.
>
> I mentioned this some time ago (months?) on -current but never got any
> response.  I didn't have any nice trace, though.
>   
My system completely locks up, I can't start new processes, but runnings 
ones -which don't do IO- can continue (for example a top).
I don't know ZFS internals (BTW, /usr and others are of course different 
ZFS filesystems on the pool), but it might be, that something major gets 
locked and that's why it stops here.

Anyways, if somebody can help to back this out, I'm here to try patches, 
or do experiments.

Thanks,

ps: -CURRENT from around a month/half months ago still have this problem.

From owner-freebsd-fs@FreeBSD.ORG  Mon Mar 31 14:15:14 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 67A17106564A
	for <freebsd-fs@FreeBSD.org>; Mon, 31 Mar 2008 14:15:14 +0000 (UTC)
	(envelope-from bra@fsn.hu)
Received: from people.fsn.hu (people.fsn.hu [195.228.252.137])
	by mx1.freebsd.org (Postfix) with ESMTP id 248C18FC17
	for <freebsd-fs@FreeBSD.org>; Mon, 31 Mar 2008 14:15:13 +0000 (UTC)
	(envelope-from bra@fsn.hu)
Received: from [172.16.151.53] (fw.axelero.hu [195.228.243.120])
	by people.fsn.hu (Postfix) with ESMTP id 8BA1FADB00;
	Mon, 31 Mar 2008 16:15:06 +0200 (CEST)
Message-ID: <47F0F1E8.1080504@fsn.hu>
Date: Mon, 31 Mar 2008 16:15:04 +0200
From: Attila Nagy <bra@fsn.hu>
User-Agent: Thunderbird 2.0.0.12 (Windows/20080213)
MIME-Version: 1.0
To: gary.jennejohn@freenet.de
References: <47F0D02B.8060504@fsn.hu>	<20080331152251.62526181@peedub.jennejohn.org>
	<47F0EDD6.8060402@fsn.hu>
In-Reply-To: <47F0EDD6.8060402@fsn.hu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
Subject: Re: ZFS hangs very often
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Mar 2008 14:15:14 -0000

On 2008.03.31. 15:57, Attila Nagy wrote:
> My system completely locks up, I can't start new processes, but 
> runnings ones -which don't do IO- can continue (for example a top).
> I don't know ZFS internals (BTW, /usr and others are of course 
> different ZFS filesystems on the pool), but it might be, that 
> something major gets locked and that's why it stops here.
>
> Anyways, if somebody can help to back this out, I'm here to try 
> patches, or do experiments.
I forgot to tell -I don't know, maybe it's important-, that I have an 
SMP box (but tried with UP kernel, the effect is the same) and 
compression is enabled on every filesystems.

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 04:46:50 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E85751065689;
	Tue,  1 Apr 2008 04:46:49 +0000 (UTC) (envelope-from root@mmu.edu.my)
Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 24C488FC23;
	Tue,  1 Apr 2008 04:46:48 +0000 (UTC) (envelope-from root@mmu.edu.my)
Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0)
	id DC1CF4D5CC0; Tue,  1 Apr 2008 12:28:09 +0800 (MYT)
Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53])
	by mmu.edu.my (Postfix) with ESMTP id 415B755E4AC
	for <opal@mmu.edu.my>; Thu, 27 Mar 2008 13:37:20 +0800 (MYT)
Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36])
	by mx2.freebsd.org (Postfix) with ESMTP id B033215637B;
	Thu, 27 Mar 2008 05:36:44 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Received: from hub.freebsd.org (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id E74D91065768;
	Thu, 27 Mar 2008 05:36:42 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 493011065670
	for <freebsd-current@freebsd.org>; Thu, 27 Mar 2008 05:36:33 +0000 (UTC)
	(envelope-from freebsd-current@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 069F28FC1E
	for <freebsd-current@freebsd.org>; Thu, 27 Mar 2008 05:36:32 +0000 (UTC)
	(envelope-from freebsd-current@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1Jekmm-0004CW-7x
	for freebsd-current@freebsd.org; Thu, 27 Mar 2008 05:36:24 +0000
Received: from 195.208.174.178 ([195.208.174.178])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-current@freebsd.org>; Thu, 27 Mar 2008 05:36:24 +0000
Received: from vadim_nuclight by 195.208.174.178 with local (Gmexim 0.1
	(Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-current@freebsd.org>; Thu, 27 Mar 2008 05:36:24 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-current@freebsd.org
From: Vadim Goncharov <vadim_nuclight@mail.ru>
Followup-To: gmane.os.freebsd.current
Date: Thu, 27 Mar 2008 05:36:15 +0000 (UTC)
Organization: Nuclear Lightning @ Tomsk, TPU AVTF Hostel
Lines: 22
Message-ID: <slrnfumcif.243h.vadim_nuclight@hostel.avtf.net>
References: <47E9448F.1010304@ipfw.ru>
	<20080326142115.K34007@fledge.watson.org>
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: 195.208.174.178
X-Comment-To: Robert Watson
User-Agent: slrn/0.9.8.1 (FreeBSD)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Sender: owner-freebsd-current@freebsd.org
Errors-To: owner-freebsd-current@freebsd.org
Cc: freebsd-fs@freebsd.org
Subject: Re: unionfs status
X-BeenThere: freebsd-fs@freebsd.org
Reply-To: vadim_nuclight@mail.ru
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 04:46:50 -0000

Hi Robert Watson! 

On Wed, 26 Mar 2008 14:53:25 +0000 (GMT); Robert Watson wrote about 'Re: unionfs status':

> You can imagine a number of schemes to replicate pointer changes around or 
> track the various outstanding references, but I think a more fundamental 
> question is whether this is in fact the right behavior at all.  The premise of 
> is that writes flow up, but not down, and "connections" to sockets are 
> read-write events, not read events, most typically.  If you're using unionfs 
> to take a template system and "broadcast it" to many jails, you probably don't 
> want all the jails talking to the same syslogd, you want them each talking to 
> their own.  When syslogd in a jail finds a disconnected socket, which is 
> effectively what a NULL v_socket pointer means, in /var/run/log, it should be 
> unlinking it and creating a new socket, not reusing the existing file on disk.

This code's use in jails is primarily intended for mysql (and the like
daemons), not syslogd (for which you said it right). Such daemons really
require broadcasting, yep - so unionfs should support it...

-- 
WBR, Vadim Goncharov. ICQ#166852181       mailto:vadim_nuclight@mail.ru
[Moderator of RU.ANTI-ECOLOGY][FreeBSD][http://antigreen.org][LJ:/nuclight]

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 04:53:21 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 89B1A1065670;
	Tue,  1 Apr 2008 04:53:21 +0000 (UTC) (envelope-from root@mmu.edu.my)
Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 10E8C8FC28;
	Tue,  1 Apr 2008 04:53:21 +0000 (UTC) (envelope-from root@mmu.edu.my)
Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0)
	id 226574D626B; Tue,  1 Apr 2008 12:33:53 +0800 (MYT)
Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53])
	by mmu.edu.my (Postfix) with ESMTP id 8592755E4A8
	for <opal@mmu.edu.my>; Thu, 27 Mar 2008 21:58:24 +0800 (MYT)
Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36])
	by mx2.freebsd.org (Postfix) with ESMTP id D08991A8F09;
	Thu, 27 Mar 2008 13:56:57 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Received: from hub.freebsd.org (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id 5A3F21065716;
	Thu, 27 Mar 2008 13:56:55 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B67951065676;
	Thu, 27 Mar 2008 13:56:41 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 72A898FC34;
	Thu, 27 Mar 2008 13:56:41 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 4949846B96;
	Thu, 27 Mar 2008 09:56:40 -0400 (EDT)
Date: Thu, 27 Mar 2008 13:56:40 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Vadim Goncharov <vadim_nuclight@mail.ru>
In-Reply-To: <slrnfumgvp.25r3.vadim_nuclight@hostel.avtf.net>
Message-ID: <20080327135318.R73942@fledge.watson.org>
References: <47E9448F.1010304@ipfw.ru>
	<20080326142115.K34007@fledge.watson.org>
	<slrnfumcif.243h.vadim_nuclight@hostel.avtf.net>
	<20080327062556.GE3180@home.opsec.eu>
	<slrnfumgvp.25r3.vadim_nuclight@hostel.avtf.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Sender: owner-freebsd-current@freebsd.org
Errors-To: owner-freebsd-current@freebsd.org
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Subject: Re: unionfs status
X-BeenThere: freebsd-fs@freebsd.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 04:53:21 -0000


On Thu, 27 Mar 2008, Vadim Goncharov wrote:

>> Thanks for this description. So we basically have two different uses for 
>> UNIX sockets in unionfs with jails ?
>
>> 1) socket in jail to communicate only inside one jail (syslog-case) 2) 
>> socket in jail as a means of IPC between different jails (mysql-case)
>
>> Is 2) really supposed to work like this ?
>
> This is user's/admin's point of view, that it should work this way: one 
> mysql with one socket for several jails. I don't know all gory details about 
> how code really works.

As I see it, nullfs should provide a shared socket, it is intended to provide 
access to the same object, and unionfs should provide independent sockets, as 
unionfs is intended to provide isolation.

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 05:05:51 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 49D6210661EC
	for <freebsd-fs@freebsd.org>; Tue,  1 Apr 2008 05:05:51 +0000 (UTC)
	(envelope-from root@mmu.edu.my)
Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 7D4F08FC1C
	for <freebsd-fs@freebsd.org>; Tue,  1 Apr 2008 05:05:50 +0000 (UTC)
	(envelope-from root@mmu.edu.my)
Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0)
	id D3A094D52EA; Tue,  1 Apr 2008 12:24:25 +0800 (MYT)
Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53])
	by mmu.edu.my (Postfix) with ESMTP id 0F31F55E498
	for <opal@mmu.edu.my>; Thu, 27 Mar 2008 14:55:55 +0800 (MYT)
Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36])
	by mx2.freebsd.org (Postfix) with ESMTP id 60193162CBA;
	Thu, 27 Mar 2008 06:55:18 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Received: from hub.freebsd.org (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id 66573106566B;
	Thu, 27 Mar 2008 06:55:14 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 64518106566C
	for <freebsd-current@freebsd.org>; Thu, 27 Mar 2008 06:55:05 +0000 (UTC)
	(envelope-from freebsd-current@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 1E52B8FC20
	for <freebsd-current@freebsd.org>; Thu, 27 Mar 2008 06:55:04 +0000 (UTC)
	(envelope-from freebsd-current@m.gmane.org)
Received: from root by ciao.gmane.org with local (Exim 4.43)
	id 1Jem0s-0007K2-V4
	for freebsd-current@freebsd.org; Thu, 27 Mar 2008 06:55:02 +0000
Received: from 195.208.174.178 ([195.208.174.178])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-current@freebsd.org>; Thu, 27 Mar 2008 06:55:02 +0000
Received: from vadim_nuclight by 195.208.174.178 with local (Gmexim 0.1
	(Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-current@freebsd.org>; Thu, 27 Mar 2008 06:55:02 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-current@freebsd.org
From: Vadim Goncharov <vadim_nuclight@mail.ru>
Followup-To: gmane.os.freebsd.current
Date: Thu, 27 Mar 2008 06:51:37 +0000 (UTC)
Organization: Nuclear Lightning @ Tomsk, TPU AVTF Hostel
Lines: 30
Message-ID: <slrnfumgvp.25r3.vadim_nuclight@hostel.avtf.net>
References: <47E9448F.1010304@ipfw.ru>
	<20080326142115.K34007@fledge.watson.org>
	<slrnfumcif.243h.vadim_nuclight@hostel.avtf.net>
	<20080327062556.GE3180@home.opsec.eu>
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: 195.208.174.178
X-Comment-To: Kurt Jaeger
User-Agent: slrn/0.9.8.1 (FreeBSD)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Sender: owner-freebsd-current@freebsd.org
Errors-To: owner-freebsd-current@freebsd.org
Cc: freebsd-fs@freebsd.org
Subject: Re: unionfs status
X-BeenThere: freebsd-fs@freebsd.org
Reply-To: vadim_nuclight@mail.ru
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 05:05:51 -0000

Hi Kurt Jaeger! 

On Thu, 27 Mar 2008 07:25:56 +0100; Kurt Jaeger wrote about 'Re: unionfs status':

>>> If you're using unionfs 
>>> to take a template system and "broadcast it" to many jails, you probably don't 
>>> want all the jails talking to the same syslogd, you want them each talking to 
>>> their own.  When syslogd in a jail finds a disconnected socket, which is 
>>> effectively what a NULL v_socket pointer means, in /var/run/log, it should be 
>>> unlinking it and creating a new socket, not reusing the existing file on disk.

>> This code's use in jails is primarily intended for mysql (and the like
>> daemons), not syslogd (for which you said it right). Such daemons really
>> require broadcasting, yep - so unionfs should support it...

> Thanks for this description. So we basically have two different
> uses for UNIX sockets in unionfs with jails ?

> 1) socket in jail to communicate only inside one jail (syslog-case)
> 2) socket in jail as a means of IPC between different jails (mysql-case)

> Is 2) really supposed to work like this ?

This is user's/admin's point of view, that it should work this way: one mysql
with one socket for several jails. I don't know all gory details about how code
really works.

-- 
WBR, Vadim Goncharov. ICQ#166852181       mailto:vadim_nuclight@mail.ru
[Moderator of RU.ANTI-ECOLOGY][FreeBSD][http://antigreen.org][LJ:/nuclight]

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 05:05:54 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 370B710662AA
	for <freebsd-fs@freebsd.org>; Tue,  1 Apr 2008 05:05:54 +0000 (UTC)
	(envelope-from root@mmu.edu.my)
Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12])
	by mx1.freebsd.org (Postfix) with ESMTP id B3BA08FC14
	for <freebsd-fs@freebsd.org>; Tue,  1 Apr 2008 05:05:53 +0000 (UTC)
	(envelope-from root@mmu.edu.my)
Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0)
	id 0DA264D53DC; Tue,  1 Apr 2008 12:24:53 +0800 (MYT)
Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53])
	by mmu.edu.my (Postfix) with ESMTP id 14BBF55E4F9
	for <opal@mmu.edu.my>; Thu, 27 Mar 2008 14:28:37 +0800 (MYT)
Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36])
	by mx2.freebsd.org (Postfix) with ESMTP id 9475F1A50D9;
	Thu, 27 Mar 2008 06:26:06 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Received: from hub.freebsd.org (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id 9B0331065676;
	Thu, 27 Mar 2008 06:26:06 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 02E45106566C;
	Thu, 27 Mar 2008 06:25:58 +0000 (UTC)
	(envelope-from lists@c0mplx.org)
Received: from home.opsec.eu (unknown [IPv6:2001:14f8:200::1])
	by mx1.freebsd.org (Postfix) with ESMTP id B34128FC15;
	Thu, 27 Mar 2008 06:25:57 +0000 (UTC)
	(envelope-from lists@c0mplx.org)
Received: from pi by home.opsec.eu with local (Exim 4.69 (FreeBSD))
	(envelope-from <lists@c0mplx.org>)
	id 1JelYi-000CgM-Qt; Thu, 27 Mar 2008 07:25:56 +0100
Date: Thu, 27 Mar 2008 07:25:56 +0100
From: Kurt Jaeger <lists@c0mplx.org>
To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org
Message-ID: <20080327062556.GE3180@home.opsec.eu>
References: <47E9448F.1010304@ipfw.ru>
	<20080326142115.K34007@fledge.watson.org>
	<slrnfumcif.243h.vadim_nuclight@hostel.avtf.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <slrnfumcif.243h.vadim_nuclight@hostel.avtf.net>
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Sender: owner-freebsd-current@freebsd.org
Errors-To: owner-freebsd-current@freebsd.org
Cc: 
Subject: Re: unionfs status
X-BeenThere: freebsd-fs@freebsd.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 05:05:54 -0000

Vadim Goncharov wrote:
> Robert Watson wrote:

> > If you're using unionfs 
> > to take a template system and "broadcast it" to many jails, you probably don't 
> > want all the jails talking to the same syslogd, you want them each talking to 
> > their own.  When syslogd in a jail finds a disconnected socket, which is 
> > effectively what a NULL v_socket pointer means, in /var/run/log, it should be 
> > unlinking it and creating a new socket, not reusing the existing file on disk.

> This code's use in jails is primarily intended for mysql (and the like
> daemons), not syslogd (for which you said it right). Such daemons really
> require broadcasting, yep - so unionfs should support it...

Thanks for this description. So we basically have two different
uses for UNIX sockets in unionfs with jails ?

1) socket in jail to communicate only inside one jail (syslog-case)
2) socket in jail as a means of IPC between different jails (mysql-case)

Is 2) really supposed to work like this ?

-- 
pi@opsec.eu            +49 171 3101372                        12 years to go !
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 05:09:29 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 031301066FA8;
	Tue,  1 Apr 2008 05:09:29 +0000 (UTC) (envelope-from root@mmu.edu.my)
Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12])
	by mx1.freebsd.org (Postfix) with ESMTP id EC1B08FC1B;
	Tue,  1 Apr 2008 05:09:27 +0000 (UTC) (envelope-from root@mmu.edu.my)
Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0)
	id 0FAA44D58D7; Tue,  1 Apr 2008 12:11:03 +0800 (MYT)
Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53])
	by mmu.edu.my (Postfix) with ESMTP id E4B1755E491
	for <opal@mmu.edu.my>; Wed, 26 Mar 2008 23:59:54 +0800 (MYT)
Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36])
	by mx2.freebsd.org (Postfix) with ESMTP id ADC4C1A5B7C;
	Wed, 26 Mar 2008 15:59:17 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Received: from hub.freebsd.org (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id C472E106573A;
	Wed, 26 Mar 2008 15:59:16 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Delivered-To: freebsd-current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A695F106566B;
	Wed, 26 Mar 2008 15:59:01 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90])
	by mx1.freebsd.org (Postfix) with ESMTP id 7BA3F8FC12;
	Wed, 26 Mar 2008 15:59:01 +0000 (UTC)
	(envelope-from daichi@freebsd.org)
Received: from parancell.ongs.co.jp (dullmdaler.ongs.co.jp [202.216.246.94])
	by natial.ongs.co.jp (Postfix) with ESMTP id 8F299125438;
	Thu, 27 Mar 2008 00:39:20 +0900 (JST)
Message-ID: <47EA6E27.3060006@freebsd.org>
Date: Thu, 27 Mar 2008 00:39:19 +0900
From: Daichi GOTO <daichi@freebsd.org>
User-Agent: Thunderbird 2.0.0.12 (X11/20080325)
MIME-Version: 1.0
To: "Alexander V. Chernikov" <melifaro@ipfw.ru>
References: <47E9448F.1010304@ipfw.ru>
In-Reply-To: <47E9448F.1010304@ipfw.ru>
Content-Type: text/plain; charset=KOI8-R; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Sender: owner-freebsd-current@freebsd.org
Errors-To: owner-freebsd-current@freebsd.org
Cc: freebsd-fs@freebsd.org, freebsd-current@FreeBSD.org,
	Kurt Jaeger <lists@c0mplx.org>,
	Robert Watson <rwatson@FreeBSD.org>, dindin@yandex-team.ru
Subject: Re: unionfs status
X-BeenThere: freebsd-fs@freebsd.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 05:09:29 -0000

I should say that so sorry of my slow response. so sorry.

We are developing unionfs step by step and still have 5 next patches.

   http://people.freebsd.org/~daichi/unionfs/experiments/
   http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-p20-1.diff
   http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-p20-2.diff
   http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-p20-3.diff
   http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-p20-4.diff
   http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-p20-5.diff

p20-1:
   leads panic when "no error happens, eofflag is 0, response data is empty
   and DIAGNOSTIC is defined" while involving VOP_READDIR(9) from unionfs.
   This change fixes system hang-up using with NFS.

p20-2:
   fixed fs access issue mounting on devfs.

p20-3:
   fixed kern/109377.

p20-4:
   fixed rename panic issue

p20-5:
   fixed unix socket connection issue

On our long unionfs running test, It looks like works very well. Would you
try above patches?  So sorry of my slow response. Please accept my deepest
apology.

We are planing to commit above patches to 8-current. 7-release has been
done. It is good time to commit it to current ;)

Alexander V. Chernikov wrote:
> Hello people!
> 
> At this moment unionfs has got at least following problems:
> 1) File systems cannot mount onto upper/lower unionfs layer (partially 
> described in kern/117829)
> 2) There are problems with multithreaded programs accessing(writing) 
> files on unionfs (kern/109950)
> 3) As well there are problems with accessing unix sockets created on 
> upper/lower unionfs layers (kern/118346)
> 4) Doing mv filename same-filename causes kernel to panic on 6.X (and 
> printing warning about VOP_RENAME in 7+)
> 5) Making 'loops' when mounting unionfs causes kernel panic (kern/121385)
> 
> I have made patches solving first 4 problems
> These patches are available at http://ipfw.ru/patches/
> unionfs2.diff fixes fs mounting onto upper layer, unionfs_lmount.diff 
> fixes lower
> unionfs_threads.diff and unionfs_unix.diff fixes cases 2) and 3)
> unionfs_rename.diff fixes case with renaming
> 
> Can anybody comment/review ?

-- 
   Daichi GOTO, http://people.freebsd.org/~daichi
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 05:09:29 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 047541066FA9
	for <freebsd-fs@freebsd.org>; Tue,  1 Apr 2008 05:09:29 +0000 (UTC)
	(envelope-from root@mmu.edu.my)
Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12])
	by mx1.freebsd.org (Postfix) with ESMTP id CC9988FC18
	for <freebsd-fs@freebsd.org>; Tue,  1 Apr 2008 05:09:27 +0000 (UTC)
	(envelope-from root@mmu.edu.my)
Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0)
	id 037FB4D586E; Tue,  1 Apr 2008 12:10:58 +0800 (MYT)
Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53])
	by mmu.edu.my (Postfix) with ESMTP id E14D455E48B
	for <opal@mmu.edu.my>; Wed, 26 Mar 2008 22:55:10 +0800 (MYT)
Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36])
	by mx2.freebsd.org (Postfix) with ESMTP id B7FDA1A5CCF;
	Wed, 26 Mar 2008 14:53:40 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Received: from hub.freebsd.org (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id 3017E1065707;
	Wed, 26 Mar 2008 14:53:39 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Delivered-To: freebsd-current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B3E5D106566B;
	Wed, 26 Mar 2008 14:53:26 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 649518FC1F;
	Wed, 26 Mar 2008 14:53:26 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 2808C46B04;
	Wed, 26 Mar 2008 10:53:25 -0400 (EDT)
Date: Wed, 26 Mar 2008 14:53:25 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: "Alexander V. Chernikov" <melifaro@ipfw.ru>
In-Reply-To: <47E9448F.1010304@ipfw.ru>
Message-ID: <20080326142115.K34007@fledge.watson.org>
References: <47E9448F.1010304@ipfw.ru>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Sender: owner-freebsd-current@freebsd.org
Errors-To: owner-freebsd-current@freebsd.org
Cc: freebsd-fs@freebsd.org, freebsd-current@FreeBSD.org
Subject: Re: unionfs status
X-BeenThere: freebsd-fs@freebsd.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 05:09:29 -0000

On Tue, 25 Mar 2008, Alexander V. Chernikov wrote:

> I have made patches solving first 4 problems These patches are available at 
> http://ipfw.ru/patches/ unionfs2.diff fixes fs mounting onto upper layer, 
> unionfs_lmount.diff fixes lower unionfs_threads.diff and unionfs_unix.diff 
> fixes cases 2) and 3) unionfs_rename.diff fixes case with renaming
>
> Can anybody comment/review ?

Dear Alexander,

Unfortunately, I don't know too much about unionfs.  However, I can comment on 
the UNIX domain socket patch:

> --- sys/fs/unionfs/union_subr.c.orig	2008-03-13 23:10:32.000000000 +0300
> +++ sys/fs/unionfs/union_subr.c	2008-03-13 23:17:34.000000000 +0300
> @@ -160,6 +160,8 @@
>  		unp->un_path[cnp->cn_namelen] = '\0';
>  	}
>  	vp->v_type = (uppervp != NULLVP ? uppervp->v_type : lowervp->v_type);
> +	if (vp->v_type == VSOCK)
> +		vp->v_socket = (uppervp != NULLVP) ? uppervp->v_socket : lowervp->v_socket;
>  	if ((lowervp != NULLVP) && (lowervp->v_type == VDIR))
>  		vp->v_mountedhere = lowervp->v_mountedhere;
>  	vp->v_data = unp;

I'm a bit worried about this assignment, as it represents an untracked alias 
for the socket.  Let me explain why:

UNIX domain sockets may have file system bindings, allowing them to use the 
file system namespace as a rendezvous for communication.  Typical use is that 
a socket is created, bind() is called on it with a path in some location like 
/var/run/log.  Other processes turn up and connect() to the path, causing a 
file system lookup to reach the vnode of the socket, and then the socket code 
follows vp->v_socket to find the socket to connect to.  When a bound socket is 
closed, we follow a back-pointer from the UNIX domain socket to the vnode, and 
then clear the pointer.  Doing this in a race-free manner is somewhat tricky, 
and I'm not 100% convinced it's correct currently, although it appears to be 
somewhat close to right.

The upshot of all this is that if you copy the pointer value to other vnodes, 
such as vnodes on upper layer, the UNIX domain socket code won't clear those 
pointers before freeing the socket they point at.  This means that the above 
code snippet may lead to a v_socket pointer on a higher layer vnode pointing 
at the right socket, the wrong socket, or possibly some other bit of freed and 
maybe reused memory.

You can imagine a number of schemes to replicate pointer changes around or 
track the various outstanding references, but I think a more fundamental 
question is whether this is in fact the right behavior at all.  The premise of 
is that writes flow up, but not down, and "connections" to sockets are 
read-write events, not read events, most typically.  If you're using unionfs 
to take a template system and "broadcast it" to many jails, you probably don't 
want all the jails talking to the same syslogd, you want them each talking to 
their own.  When syslogd in a jail finds a disconnected socket, which is 
effectively what a NULL v_socket pointer means, in /var/run/log, it should be 
unlinking it and creating a new socket, not reusing the existing file on disk.

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 05:24:49 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C34A410656CE;
	Tue,  1 Apr 2008 05:24:49 +0000 (UTC) (envelope-from root@mmu.edu.my)
Received: from staff.cyber.mmu.edu.my (staff.cyber.mmu.edu.my [203.106.62.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 1C7968FC22;
	Tue,  1 Apr 2008 05:24:49 +0000 (UTC) (envelope-from root@mmu.edu.my)
Received: by staff.cyber.mmu.edu.my (Postfix, from userid 0)
	id C6FFB4D50C4; Tue,  1 Apr 2008 13:17:26 +0800 (MYT)
Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53])
	by mmu.edu.my (Postfix) with ESMTP id BDCF855E487
	for <opal@mmu.edu.my>; Fri, 28 Mar 2008 03:23:43 +0800 (MYT)
Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36])
	by mx2.freebsd.org (Postfix) with ESMTP id 7BF2C1A72A1;
	Thu, 27 Mar 2008 19:22:26 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Received: from hub.freebsd.org (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id 8773F106576A;
	Thu, 27 Mar 2008 19:22:23 +0000 (UTC)
	(envelope-from owner-freebsd-current@freebsd.org)
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 939D5106566B
	for <freebsd-current@freebsd.org>; Thu, 27 Mar 2008 19:22:13 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from outB.internet-mail-service.net (outb.internet-mail-service.net
	[216.240.47.225])
	by mx1.freebsd.org (Postfix) with ESMTP id 8088A8FC23
	for <freebsd-current@freebsd.org>; Thu, 27 Mar 2008 19:22:13 +0000 (UTC)
	(envelope-from julian@elischer.org)
Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160)
	by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP;
	Thu, 27 Mar 2008 17:44:19 -0700
Received: from julian-mac.elischer.org (localhost [127.0.0.1])
	by idiom.com (Postfix) with ESMTP id 4EAA12D6010;
	Thu, 27 Mar 2008 12:22:12 -0700 (PDT)
Message-ID: <47EBF3E4.4000607@elischer.org>
Date: Thu, 27 Mar 2008 12:22:12 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213)
MIME-Version: 1.0
To: Kurt Jaeger <lists@c0mplx.org>
References: <47E9448F.1010304@ipfw.ru>	<20080326142115.K34007@fledge.watson.org>	<slrnfumcif.243h.vadim_nuclight@hostel.avtf.net>
	<20080327062556.GE3180@home.opsec.eu>
In-Reply-To: <20080327062556.GE3180@home.opsec.eu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Sender: owner-freebsd-current@freebsd.org
Errors-To: owner-freebsd-current@freebsd.org
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Subject: Re: unionfs status
X-BeenThere: freebsd-fs@freebsd.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 05:24:49 -0000

Kurt Jaeger wrote:
> Vadim Goncharov wrote:
>> Robert Watson wrote:
> 
>>> If you're using unionfs 
>>> to take a template system and "broadcast it" to many jails, you probably don't 
>>> want all the jails talking to the same syslogd, you want them each talking to 
>>> their own.  When syslogd in a jail finds a disconnected socket, which is 
>>> effectively what a NULL v_socket pointer means, in /var/run/log, it should be 
>>> unlinking it and creating a new socket, not reusing the existing file on disk.
> 
>> This code's use in jails is primarily intended for mysql (and the like
>> daemons), not syslogd (for which you said it right). Such daemons really
>> require broadcasting, yep - so unionfs should support it...
> 
> Thanks for this description. So we basically have two different
> uses for UNIX sockets in unionfs with jails ?
> 
> 1) socket in jail to communicate only inside one jail (syslog-case)
> 2) socket in jail as a means of IPC between different jails (mysql-case)
> 
> Is 2) really supposed to work like this ?

think about it..
the socket is a file interface to a process.
if you are reading the same socket, you expect to get the same process.

in (1) you put the socket somewhere not shared.
in (2) you put the socket somewhere shared.

in nullfs you are allowing access to the same vnode via several 
namespaces positions. A new socket is visible to all jails.
In unionfs a new socket would replace the old one and thus be only 
locally visible (refers to a different vnode to those accessed by the 
same name in other mounts).


> 

_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 07:54:44 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 098671065670
	for <freebsd-fs@freebsd.org>; Tue,  1 Apr 2008 07:54:44 +0000 (UTC)
	(envelope-from wangyi6854@gmail.com)
Received: from ti-out-0910.google.com (ti-out-0910.google.com [209.85.142.187])
	by mx1.freebsd.org (Postfix) with ESMTP id 8CDE38FC2D
	for <freebsd-fs@freebsd.org>; Tue,  1 Apr 2008 07:54:43 +0000 (UTC)
	(envelope-from wangyi6854@gmail.com)
Received: by ti-out-0910.google.com with SMTP id j2so619585tid.3
	for <freebsd-fs@freebsd.org>; Tue, 01 Apr 2008 00:54:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=DJW7dMd74MIdDXgD33AMUdw6IjftB2ZRGg2PHvDAybU=;
	b=cesp+97OYGZm/XliwUrwCb+63W46AyJ7u5S3q/7BHClJYSv1RicWH5BNFtgrUwD0wbBClWFje/ElgyKB8IMRH7b+wiSrhxPLfH5X7Vw53xP4Mr7R+RUT6p97toVDMfDFP7r/0OeD56kYSa+iw0rrX4z4VFbFKBfYplse7hG50Ic=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=bNwpShVcS6Kzw8Ku2mqPB7kVBvAzYhTzILfCz12WbSqSFyf2DsjxYMaWHvLCWJJfM2aPAnjKNKSY6FTHgVAOdghyrAp2r8xbfylwuR8wv3KNaQRBXQ2e2bYH6w1xdwBY/JkDDfT3+rKpLZL6M2jBtSEoytrqai04O7cGisgVol4=
Received: by 10.110.31.11 with SMTP id e11mr3341169tie.56.1207034859245;
	Tue, 01 Apr 2008 00:27:39 -0700 (PDT)
Received: by 10.110.10.14 with HTTP; Tue, 1 Apr 2008 00:27:39 -0700 (PDT)
Message-ID: <5ea5cca50804010027k51b59658mb28a481c516e84b0@mail.gmail.com>
Date: Tue, 1 Apr 2008 15:27:39 +0800
From: "Yi Wang" <wangyi6854@gmail.com>
To: "Attilio Rao" <attilio@freebsd.org>
In-Reply-To: <3bbf2fe10802061700p253e68b8s704deb3e5e4ad086@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <3bbf2fe10802061700p253e68b8s704deb3e5e4ad086@mail.gmail.com>
Cc: Yar Tikhiy <yar@freebsd.org>, Doug Barton <dougb@freebsd.org>,
	Jeff Roberson <jeff@freebsd.org>, freebsd-fs@freebsd.org,
	Scot Hetzel <swhetzel@gmail.com>, freebsd-arch@freebsd.org
Subject: Re: [RFC] Remove NTFS kernel support
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 07:54:44 -0000

On 2/7/08, Attilio Rao <attilio@freebsd.org> wrote:
> As exposed by several users, NTFS seems to be broken even before first
>  VFS commits happeing around the end of December. Those commits exposed
>  some problems about NTFS which are currently under investigation.
>  Ultimately, This filesystem is also unmaintained at the moment.
>
>  Speaking with jeff, we agreed on what can be a possible compromise:
>  remove the kernel support for NTFS and maybe take care of the FUSE
>  implementation.
>  What I now propose is a small survey which can shade a light on us
>  about what do you think about this idea and its implications:
>  - Do you use NTFS?

Yes. I have a dual-boot machine.

>  - Are you interested in maintaining it?

No. I'm not familiar with kernel/fs programming.

>  - Do you know a good reason to not use FUSE ntfs implementation? What

Yes. Listening music and watching video on ntfs disks stops frequently
using ntfs-3g.

>  the kernel counter part adds?

I've no idea.

>  - Do you think axing the kernel support a good idea?

For servers, Yes. For desktops, NO!

>
>  Thanks,
>  Attilio
>
>
>
>  --
>  Peace can only be achieved by understanding - A. Einstein
>  _______________________________________________
>  freebsd-fs@freebsd.org mailing list
>  http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>  To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


-- 
Regards,
Wang Yi

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 20:15:56 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 603751065676
	for <freebsd-fs@freebsd.org>; Tue,  1 Apr 2008 20:15:56 +0000 (UTC)
	(envelope-from crahman@gmail.com)
Received: from gv-out-0910.google.com (gv-out-0910.google.com [216.239.58.184])
	by mx1.freebsd.org (Postfix) with ESMTP id E07358FC2A
	for <freebsd-fs@freebsd.org>; Tue,  1 Apr 2008 20:15:55 +0000 (UTC)
	(envelope-from crahman@gmail.com)
Received: by gv-out-0910.google.com with SMTP id n40so441543gve.39
	for <freebsd-fs@freebsd.org>; Tue, 01 Apr 2008 13:15:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition;
	bh=pbI5ViE5a2V95xufsRG0mCZHe6Bh/gcosBz0qc72gbE=;
	b=MyvLSjPsKYorNPN8IRxeZOQa0AJJ5ypVzACFHYvIslWWoQqsg1SFntzvHsqI9PgweM7CL0PpHIJMn1d61s9L9k4VdL/vdbjEuTJcIv781Qs+7QMr1FHRMBpCezpYysx6AxtACuvVGI6CrQsCiBlw1pJ1AJS5VtwGQpxhYlVpOLA=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition;
	b=gJoZwdu0AM/SUPfSyG0tnbCTzCZHj4HPIOgq5+zgON1aYp64umCTteo3iEvkrm8hj/7mW288qB/KvLbd4YaNEdj+ObEPshn/9sWzsswwecDBZseWiIofIjuaXEVkfx10FVwOY37Lp1zlsZo0tDsO5QIuXl5I2JwKUctSdUljcDY=
Received: by 10.142.240.9 with SMTP id n9mr5273269wfh.136.1207079487231;
	Tue, 01 Apr 2008 12:51:27 -0700 (PDT)
Received: by 10.142.188.17 with HTTP; Tue, 1 Apr 2008 12:51:27 -0700 (PDT)
Message-ID: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com>
Date: Tue, 1 Apr 2008 13:51:27 -0600
From: "Cyrus Rahman" <crahman@gmail.com>
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Subject: Trouble with snapshots
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 20:15:56 -0000

I'm seeing serious problems with snapshot deadlocks on 7.0-RELEASE
right now.  I haven't been able to set up a test environment to really
determine precise details, but this much I know:  Filesystem i/o will
eventually lock up, requiring a hard reset, after the snapshot mount
sleeps permanently on suspfs.  Eventually there's a cascade and
everything ends up waiting on suspfs.  Running a 'sync' after mount
hangs is a sure way to propagate the problem.  This happens very often
- probably 15% probability per snapshot on the server running 7.0.
It's bad enough so that it's not realistic to use snapshots there.
Other strange things have been observed, in that an entire day's worth
of work vanished - after the reset/reboot the filesystems were consistent,
but in the state they were in many hours before, at the time the snapshot
hung.  The snapshot had been observed hanging, but everything else seemed
to work so a decision was made to reboot at the end of the day - with
disastrous effect!  During the day nothing unusual except for the hung
snapshot was noticed.  I'm guessing everything just got cached (for
hours!) and the cache never got flushed.

This is happening on a system set up with journaled ufs filesystems,
so that may be part of the problem.  The system is running amd64 with
an Intel Q6600.

The filesystem that has trouble with this has a number of
large files, about 500-700Mb on it.  Filesystems with only small files
do not seem to have trouble, even though they are bigger filesystems
with more files.  I can't think of anything else unique about it.

From owner-freebsd-fs@FreeBSD.ORG  Tue Apr  1 20:19:10 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 737431065673;
	Tue,  1 Apr 2008 20:19:10 +0000 (UTC)
	(envelope-from kris@FreeBSD.org)
Received: from weak.local (freefall.freebsd.org [IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id D18048FC1C;
	Tue,  1 Apr 2008 20:19:08 +0000 (UTC)
	(envelope-from kris@FreeBSD.org)
Message-ID: <47F298C2.7040606@FreeBSD.org>
Date: Tue, 01 Apr 2008 22:19:14 +0200
From: Kris Kennaway <kris@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213)
MIME-Version: 1.0
To: Cyrus Rahman <crahman@gmail.com>
References: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com>
In-Reply-To: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: Trouble with snapshots
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Apr 2008 20:19:10 -0000

Cyrus Rahman wrote:
> I'm seeing serious problems with snapshot deadlocks on 7.0-RELEASE
> right now.  I haven't been able to set up a test environment to really
> determine precise details, but this much I know:  Filesystem i/o will
> eventually lock up, requiring a hard reset, after the snapshot mount
> sleeps permanently on suspfs.  Eventually there's a cascade and
> everything ends up waiting on suspfs.  Running a 'sync' after mount
> hangs is a sure way to propagate the problem.  This happens very often
> - probably 15% probability per snapshot on the server running 7.0.
> It's bad enough so that it's not realistic to use snapshots there.
> Other strange things have been observed, in that an entire day's worth
> of work vanished - after the reset/reboot the filesystems were consistent,
> but in the state they were in many hours before, at the time the snapshot
> hung.  The snapshot had been observed hanging, but everything else seemed
> to work so a decision was made to reboot at the end of the day - with
> disastrous effect!  During the day nothing unusual except for the hung
> snapshot was noticed.  I'm guessing everything just got cached (for
> hours!) and the cache never got flushed.
> 
> This is happening on a system set up with journaled ufs filesystems,
> so that may be part of the problem.  The system is running amd64 with
> an Intel Q6600.

I thought gjournal and soft updates were supposed to be mutually 
exclusive (the latter is required for UFS snapshots).  Anyway, even if 
they are supposed to work together this interaction is almost certainly 
the cause.

Kris

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr  2 07:31:10 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C2AE3106564A
	for <freebsd-fs@freebsd.org>; Wed,  2 Apr 2008 07:31:10 +0000 (UTC)
	(envelope-from crahman@gmail.com)
Received: from wf-out-1314.google.com (wf-out-1314.google.com [209.85.200.169])
	by mx1.freebsd.org (Postfix) with ESMTP id 3E09F8FC21
	for <freebsd-fs@freebsd.org>; Wed,  2 Apr 2008 07:31:10 +0000 (UTC)
	(envelope-from crahman@gmail.com)
Received: by wf-out-1314.google.com with SMTP id 25so2565614wfa.7
	for <freebsd-fs@freebsd.org>; Wed, 02 Apr 2008 00:31:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=UxUEHjHBUWON88Pm/JD62zdt6D4AIDI6s3WxZ8PB4qs=;
	b=Mi7Rol4xzFyp6ZRDqRhD7wkYCF8z07sNS0PNR7StdzwqJS2QOWCgiAa6muRJFkDNY6EDRgJ0MqBLlzkrcpXAvF6eV7kXyGo8DpV52ZoKdLfA2IvgVE8ndvJI2sGAqcBWwp0tzxjQOMnofGD7+wgGxejiRrXQURxDdE39vhegdXI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=o3PLamhvsPyArxLDwZFBwFYrGeRampiVJEZfwvQZUzlBLDZepuZKu1+c/3qqKMfX/Mnbzxg9kMYzFIi15XiAgrZktTOap0hjxc8CRKVq4Eg24AHxxZrCqDAo/0BI0PyqsIp20REEfQYHB2gOscEC+lkGypQ1d1KwPuqF5i/eJ9E=
Received: by 10.142.226.2 with SMTP id y2mr5639536wfg.137.1207121470022;
	Wed, 02 Apr 2008 00:31:10 -0700 (PDT)
Received: by 10.142.188.17 with HTTP; Wed, 2 Apr 2008 00:31:10 -0700 (PDT)
Message-ID: <9e77bdb50804020031r2fba0840g7281e879522120d5@mail.gmail.com>
Date: Wed, 2 Apr 2008 01:31:10 -0600
From: "Cyrus Rahman" <crahman@gmail.com>
To: "Kris Kennaway" <kris@freebsd.org>
In-Reply-To: <47F298C2.7040606@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com>
	<47F298C2.7040606@FreeBSD.org>
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: Trouble with snapshots
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Apr 2008 07:31:10 -0000

>  > This is happening on a system set up with journaled ufs filesystems,
>  > so that may be part of the problem.  The system is running amd64
>  > with an Intel Q6600.
>
>  I thought gjournal and soft updates were supposed to be mutually
>  exclusive (the latter is required for UFS snapshots).  Anyway, even if
>  they are supposed to work together this interaction is almost certainly
>  the cause.

I actually think that snapshots are a part of UFS2 and that they work
just fine with or without soft updates.

I was wondering if the problems I've seen are limited strictly to
gjournal-based UFS2 systems.  I'm guessing that they are, based upon
the fact that the problems are dramatic enough to have shown up in
discussion if they were widespread.  But I also wondered if perhaps
the additional concurrency associated with multiple processors might
be a factor.

As it is, it may be prudent for someone intending to use dump with
snapshots to hold off on building filesystems with gjournal until this
is resolved.

Other than this problem, the gjournal/ufs integration has worked
flawlessly here.

From owner-freebsd-fs@FreeBSD.ORG  Wed Apr  2 14:38:03 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D3216106564A
	for <freebsd-fs@freebsd.org>; Wed,  2 Apr 2008 14:38:03 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 8B3688FC1B
	for <freebsd-fs@freebsd.org>; Wed,  2 Apr 2008 14:38:03 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1Jh46D-000777-0y
	for freebsd-fs@freebsd.org; Wed, 02 Apr 2008 14:38:01 +0000
Received: from firewall.andxor.it ([195.223.2.2])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Wed, 02 Apr 2008 14:38:01 +0000
Received: from lapo by firewall.andxor.it with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Wed, 02 Apr 2008 14:38:01 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Lapo Luchini <lapo@lapo.it>
Date: Wed, 02 Apr 2008 16:37:49 +0200
Lines: 23
Message-ID: <ft05nu$nqd$1@ger.gmane.org>
References: <47F0D02B.8060504@fsn.hu>	<20080331152251.62526181@peedub.jennejohn.org>	<47F0EDD6.8060402@fsn.hu>
	<47F0F1E8.1080504@fsn.hu>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: firewall.andxor.it
User-Agent: Thunderbird 2.0.0.12 (X11/20080303)
In-Reply-To: <47F0F1E8.1080504@fsn.hu>
X-Enigmail-Version: 0.95.6
OpenPGP: id=C8F252FB
Sender: news <news@ger.gmane.org>
Subject: Re: ZFS hangs very often
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Apr 2008 14:38:03 -0000

Attila Nagy wrote:
> On 2008.03.31. 15:57, Attila Nagy wrote:
>> My system completely locks up, I can't start new processes, but 
>> runnings ones -which don't do IO- can continue (for example a top).
>> I don't know ZFS internals (BTW, /usr and others are of course 
>> different ZFS filesystems on the pool), but it might be, that 
>> something major gets locked and that's why it stops here.
> I forgot to tell -I don't know, maybe it's important-, that I have an 
> SMP box (but tried with UP kernel, the effect is the same) and 
> compression is enabled on every filesystems.

I have similar symptoms, on a Dual AMD64, 4x SATA GELI + RAIDZ.

Mainly after I turned off one drive out of 4 in a RAIDZ pool (one of the 
two SATA channel on the motherboard is flaky, I'm waiting for a new PCI 
controller) and I can consistently reproduce it mdconfig-uring a 120GB 
image of a ddrescue-d HDD and then mounting an UFS2 partition on it and 
moving massive amounts of data in that hangs within few minutes, but it 
will lock after a few hours anyways even not touching that huge file.

I'll try to produce some debugging myself ASAP...

   Lapo


From owner-freebsd-fs@FreeBSD.ORG  Thu Apr  3 11:14:09 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 903BF1065671
	for <freebsd-fs@freebsd.org>; Thu,  3 Apr 2008 11:14:09 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 04F838FC18
	for <freebsd-fs@freebsd.org>; Thu,  3 Apr 2008 11:14:08 +0000 (UTC)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely5.cicely.de ([10.1.1.7])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id m33Ad5ks051636
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Thu, 3 Apr 2008 12:39:05 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (cicely12.cicely.de [10.1.1.14])
	by cicely5.cicely.de (8.13.4/8.13.4) with ESMTP id m33AcxaG038665
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 3 Apr 2008 12:39:00 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: from cicely12.cicely.de (localhost [127.0.0.1])
	by cicely12.cicely.de (8.13.4/8.13.3) with ESMTP id m33AcxrE034481;
	Thu, 3 Apr 2008 12:38:59 +0200 (CEST)
	(envelope-from ticso@cicely12.cicely.de)
Received: (from ticso@localhost)
	by cicely12.cicely.de (8.13.4/8.13.3/Submit) id m33AcxNh034480;
	Thu, 3 Apr 2008 12:38:59 +0200 (CEST) (envelope-from ticso)
Date: Thu, 3 Apr 2008 12:38:59 +0200
From: Bernd Walter <ticso@cicely12.cicely.de>
To: Attila Nagy <bra@fsn.hu>
Message-ID: <20080403103858.GX15954@cicely12.cicely.de>
References: <47F0D02B.8060504@fsn.hu>
	<20080331152251.62526181@peedub.jennejohn.org>
	<47F0EDD6.8060402@fsn.hu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <47F0EDD6.8060402@fsn.hu>
X-Operating-System: FreeBSD cicely12.cicely.de 5.4-STABLE alpha
User-Agent: Mutt/1.5.9i
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED=-1.8,
	BAYES_00=-2.599 autolearn=ham version=3.2.3
X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on cicely12.cicely.de
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS hangs very often
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Apr 2008 11:14:09 -0000

On Mon, Mar 31, 2008 at 03:57:42PM +0200, Attila Nagy wrote:
> On 2008.03.31. 15:22, Gary Jennejohn wrote:
> >On Mon, 31 Mar 2008 13:51:07 +0200
> >Attila Nagy <bra@fsn.hu> wrote:
> My system completely locks up, I can't start new processes, but runnings 
> ones -which don't do IO- can continue (for example a top).
> I don't know ZFS internals (BTW, /usr and others are of course different 
> ZFS filesystems on the pool), but it might be, that something major gets 
> locked and that's why it stops here.

You can renice and kill a process using top.
So if you have a running top you can still test this.
I've seen this kind of hangs as well, but since updating to current from
15th march it never happened again, but I'm not aware of any commit that
might have fixed it, so in the end it just might be luck.
I didn't investigate the problem very much, because I have had several
Timeout problems with drives that run fine after adding further drives,
which turned out to be a insuffcient power supply and everytime I accessed
the second pool at the same time I went into troubles with the drives on
the first pool.
SATA drives seem to be to crappy to tell why they fail :(

> ps: -CURRENT from around a month/half months ago still have this problem.

Not for me it seems, but as said above, it may be luck.

-- 
B.Walter <bernd@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.

From owner-freebsd-fs@FreeBSD.ORG  Sat Apr  5 08:12:46 2008
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3A129106566B;
	Sat,  5 Apr 2008 08:12:46 +0000 (UTC)
	(envelope-from remko@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 025618FC1C;
	Sat,  5 Apr 2008 08:12:46 +0000 (UTC)
	(envelope-from remko@FreeBSD.org)
Received: from freefall.freebsd.org (remko@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m358CjAx058391;
	Sat, 5 Apr 2008 08:12:45 GMT
	(envelope-from remko@freefall.freebsd.org)
Received: (from remko@localhost)
	by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m358CjmL058387;
	Sat, 5 Apr 2008 08:12:45 GMT (envelope-from remko)
Date: Sat, 5 Apr 2008 08:12:45 GMT
Message-Id: <200804050812.m358CjmL058387@freefall.freebsd.org>
To: remko@FreeBSD.org, freebsd-i386@FreeBSD.org, freebsd-fs@FreeBSD.org
From: remko@FreeBSD.org
Cc: 
Subject: Re: bin/122172: [amd] [fs]: amd(8) automount daemon dies on
	6.3-STABLE i386, fine on amd6
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 05 Apr 2008 08:12:46 -0000

Old Synopsis: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6
New Synopsis: [amd] [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6

Responsible-Changed-From-To: freebsd-i386->freebsd-fs
Responsible-Changed-By: remko
Responsible-Changed-When: Sat Apr 5 08:11:45 UTC 2008
Responsible-Changed-Why: 
The backtraces show that amd(8) has a problem, reassign to the
fs team to investigate this.

http://www.freebsd.org/cgi/query-pr.cgi?pr=122172