From owner-freebsd-fs@FreeBSD.ORG Fri May 1 13:10:06 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E35E2A3 for ; Fri, 1 May 2015 13:10:06 +0000 (UTC) Received: from smtp.hostage.nl (smtp.hostage.nl [109.72.93.221]) by mx1.freebsd.org (Postfix) with ESMTP id AF10A155F for ; Fri, 1 May 2015 13:10:06 +0000 (UTC) Date: Fri, 1 May 2015 15:09:24 +0200 From: Martijn To: Robert David Cc: freebsd-fs@freebsd.org Subject: Re: ZFS stuck on write Message-ID: <20150501130924.GA12031@kobol.datajust.com> References: <20150430134659.GA4950@kobol.office.hostage.nl> <20150430161902.4868094c@robert-notebook> <20150430143017.GA5573@kobol.office.hostage.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150430143017.GA5573@kobol.office.hostage.nl> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 May 2015 13:10:07 -0000 Hi, I've been able to borrow a brand new server from my supplier, put the HDD's in and all my problems are gone. I'm gonna look into this with more detail, because i still want to pin-point the exact problem and if its really due to broken hardware. Is my assumption true that the boost in memory (the new machine has 32GB instead of 24GB) also could be the cause that everything works again? If i remember correctly a lot sysctl settings have their defaults based on the total amount of ram. I'm still wondering if there's some kind of setting(s) that would help if there's just too many filesystems+snapshots around. Or is that not something that could have been the cause of these problems? thanks, Martijn Lina. -- Hostage Keizersgracht 316 1016 EZ Amsterdam tel: +31 (0)20 4632 303 http://www.hostage.nl Once upon a 30 Apr 2015, Martijn hit keys in the following order: > Sorry.... > > No features like compress or dedup, The only devs are the storage hdd's > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > gpt/diskil ONLINE 0 0 0 > gpt/diskir ONLINE 0 0 0 > gpt/diskol ONLINE 0 0 0 > gpt/diskor ONLINE 0 0 0 > > # sysctl -a | grep l2arc > vfs.zfs.l2arc_write_max: 8388608 > vfs.zfs.l2arc_write_boost: 8388608 > vfs.zfs.l2arc_headroom: 2 > vfs.zfs.l2arc_feed_secs: 1 > vfs.zfs.l2arc_feed_min_ms: 200 > vfs.zfs.l2arc_noprefetch: 1 > vfs.zfs.l2arc_feed_again: 1 > vfs.zfs.l2arc_norw: 1 > > an example stuck proc: > > PID TID COMM TDNAME KSTACK > 4232 100606 bsdtar - mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0xc9e vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 extattr_list_vp+0x3c sys_extattr_list_fd+0xa4 amd64_syscall+0x351 Xfast_syscall+0xfb > > # procstat -kk 4 > PID TID COMM TDNAME KSTACK > 4 100092 zfskern arc_reclaim_thre mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18b arc_reclaim_thread+0x301 fork_exit+0x9a fork_trampoline+0xe > 4 100093 zfskern l2arc_feed_threa mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18b l2arc_feed_thread+0x16f fork_exit+0x9a fork_trampoline+0xe > 4 100396 zfskern trim zroot mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18b trim_thread+0x9e fork_exit+0x9a fork_trampoline+0xe > 4 100406 zfskern txg_thread_enter mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x2bb fork_exit+0x9a fork_trampoline+0xe > 4 100407 zfskern txg_thread_enter mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x16d txg_sync_thread+0x2eb fork_exit+0x9a fork_trampoline+0xe > > > If you need more i'm happy to provide. > > Could it be hardware related? If the reg ecc memory is broken, wouldn't it show on the console? > > Thanks, > Martijn. > > Once upon a 30 Apr 2015, Robert David hit keys in the following order: > > Hi Martin, > > > > so few information provided to suggest anything. > > > > What about the pool size, free space? Some exceeding quotas? Features enabled > > (compress,dedup)? Zil, l2arc? > > > > Regards, > > Robert. > > > > On Thu, 30 Apr 2015 15:47:00 +0200 > > Martijn wrote: > > > > > Hi, > > > > > > I've been trying to get an important production machine stable again since > > > yesterday afternoon, but to no avail (so far). > > > > > > It seems ZFS is the problem on this box. Situation is as follows: > > > > > > - it used to be fbsd 8.3. after reading about deadlocks which have been fixed > > > in the meantime, i upgrade to 10.1, no change, did zpool upgrade, no change > > > and did zfs upgrade -a also no change... > > > > > > - Its a machine on which each user has a separate zfs filesystem with refquota > > > set. It also did periodic zfs snapshots, every hour (48), day (14) week (8) > > > month (24), which is way too much but at the time of setting it up i thought > > > it couldn't hurt. > > > > > > - After some usage the machine gets stuck when trying to write a file. The > > > process just stops and can't be killed. After some time the whole machine > > > used to become unresponsive in 8.3, but nice 10.1 i can still reach it, > > > although processes attempting to write get stuck for ever. > > > > > > - nothing scary shows in dmesg > > > > > > What can i do? The machine has 24GB of registered ECC ram (17GB free), its a > > > RAID-Z pool with 4 sata hdd's on a LSI SAS3442E-R (1068 chip) in IT mode. > > > > > > loader.conf: > > > > > > vfs.zfs.arc_max=8G > > > vfs.zfs.txg.timeout="5" > > > vfs.zfs.prefetch_disable="1" > > > vfs.zfs.vdev.min_pending="3" > > > vfs.zfs.vdev.max_pending="6" > > > vfs.zfs.txg.write_limit_override=1073741824 > > > > > > I've tried to copy the most important users to another machine but thats gonna > > > take a lot of time. There's 160 users (websites + mailboxes) on it. > > > > > > Any help would be much appreciated! > > > > > > Thanks in advance, > > > > > > Martijn > > > > -- > Hostage > Keizersgracht 316 > 1016 EZ Amsterdam > tel: +31 (0)20 4632 303 > http://www.hostage.nl