From owner-freebsd-fs@FreeBSD.ORG Wed Mar 28 10:50:41 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B8A4F106564A for ; Wed, 28 Mar 2012 10:50:41 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id EDE558FC15 for ; Wed, 28 Mar 2012 10:50:40 +0000 (UTC) Received: by bkcjc3 with SMTP id jc3so1002851bkc.13 for ; Wed, 28 Mar 2012 03:50:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=wgEUxJeD2qJ7kywF2pJbsmbIwZ8kFRDaPVn6tlX5yRY=; b=CacpEHCpRQmLwfM5O0V2SEyRVbZDbCKOmZgZZWf/qls2utgr3w8M4Ra76DJYg2GLFt Rxv5Cddk4+Gl/QC0Kl9eIeq4USip3Wo82NwkRk9lseLty0m/AH1Bu63pJeUjTiTYT/6t 0VsaDPB0lDe4+AMOmElOqNdRGtnG/018qIu/dCF8FW8xPI4N8AZUIE6bcNqe6v/9Vwxf tcrEzL+IDuw6FYAoRAEAEZcxdC0j60NCNJa6Uyg+kT0YSAK6YW08CfBH5nHLHoc09vMu 0ZgMukIvf/zM0KCwCWzJJ3mLONU3bd4Pb6/GYaCYay4ZxuFLblCrareUoxqZsubZV1ky Ikdg== Received: by 10.205.133.10 with SMTP id hw10mr11344361bkc.61.1332931839691; Wed, 28 Mar 2012 03:50:39 -0700 (PDT) Received: from [192.168.50.103] (double-l.xs4all.nl. [80.126.205.144]) by mx.google.com with ESMTPS id zx16sm5561133bkb.13.2012.03.28.03.50.37 (version=SSLv3 cipher=OTHER); Wed, 28 Mar 2012 03:50:38 -0700 (PDT) Message-ID: <4F72ECFC.4040902@gmail.com> Date: Wed, 28 Mar 2012 12:50:36 +0200 From: Johan Hendriks User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120312 Thunderbird/11.0 MIME-Version: 1.0 To: Tim Bishop , freebsd-fs@freebsd.org References: <20120327181457.GC24787@carrick-users.bishnet.net> In-Reply-To: <20120327181457.GC24787@carrick-users.bishnet.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Re: ZFS: processes hanging when trying to access filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2012 10:50:41 -0000 Tim Bishop schreef: > I have a machine running 8-STABLE amd64 from the end of last week. I > have a problem where the machine starts to freeze up. Any process > accessing the ZFS filesystems hangs, which eventually causes more and > more processes to be spawned (cronjobs, etc, never complete). Although > the root filesystem is on UFS (the machine hosts jails on ZFS), > eventually I can't log in anymore. > > The problem occurs when the frequently used part of the ARC gets too > large. See this graph: > > http://dl.dropbox.com/u/318044/zfs_arc_utilization-day.png > > At the right of the graph things started to hang. > > At the same time I see a high amount of context switching. > > I picked a hanging process and procstat showed the following: > > PID TID COMM TDNAME KSTACK > 24787 100303 mutt - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 dmu_tx_assign+0x170 zfs_inactive+0xf1 zfs_freebsd_inactive+0x1a vinactive+0x71 vputx+0x2d8 null_reclaim+0xb3 vgonel+0x119 vrecycle+0x7b null_inactive+0x1f vinactive+0x71 vputx+0x2d8 vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 > > I'm running a reduced amount of jails on the machine at the moment which > is limiting the speed at which the machine freezes up completely. I'd > like to debug this problem further, so any advice on useful information > to collect would be appreciated. > > I've had this problem on the machine before[1] but adding more RAM > allievated the issue. > > Tim. > > [1] http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058541.html > Just a me too, only i am on FreeBSD 9.0-RELEASE AMD64 Once a week the system starts to hang. System boots from a normal disk in a mirror on UFS, the zpool is on the SAS drives in the bays. NFS is in tx >-tx state , but can not be restarted, same goes for mountd and samba leaves a lot of smbd processen in a zfs state. The only way to come out of it is a reset and restart the machine. We use the machine as NFS server for two ESXi5.0 machines. The strange thing is we have an almost identical machine that does not show this behaviour, same board, memory and raid controller. The only difference is that that machine is a 4U case. Settings in loader.conf are. # ZFS zfs_load="YES" # Tuning vfs.zfs.arc_max="12G" It gets really frustrating, there is an exchange server running on it, and i am becoming an eseutil expert which is not something i want :D. Second problem is that the whole company can not work, and i need to get the things up as fast as i can. It happened on moday only till now, but the time is random. The sytem is a 16 bay supermicro with a LSI 9211-8i card and 16 GB mem on the X9SCM-F board. Nothing fancy My arc stats at this moment are. The only thing is that i do not fully know which stats are important. zfs-stat -AE ZFS Subsystem Report Wed Mar 28 12:34:54 2012 ------------------------------------------------------------------------ ARC Summary: (HEALTHY) Memory Throttle Count: 0 ARC Misc: Deleted: 12.40m Recycle Misses: 22.95k Mutex Misses: 12.53k Evict Skips: 64.35k ARC Size: 95.44% 11.45 GiB Target Size: (Adaptive) 95.44% 11.45 GiB Min Size (Hard Limit): 12.50% 1.50 GiB Max Size (High Water): 8:1 12.00 GiB ARC Size Breakdown: Recently Used Cache Size: 93.75% 10.74 GiB Frequently Used Cache Size: 6.25% 733.11 MiB ARC Hash Breakdown: Elements Max: 308.73k Elements Current: 99.68% 307.75k Collisions: 13.69m Chain Max: 16 Chains: 77.43k ------------------------------------------------------------------------ ARC Efficiency: 69.94m Cache Hit Ratio: 83.10% 58.12m Cache Miss Ratio: 16.90% 11.82m Actual Hit Ratio: 67.26% 47.04m Data Demand Efficiency: 94.55% 35.86m Data Prefetch Efficiency: 43.37% 17.21m CACHE HITS BY CACHE LIST: Anonymously Used: 17.49% 10.16m Most Recently Used: 24.32% 14.13m Most Frequently Used: 56.62% 32.91m Most Recently Used Ghost: 0.24% 140.34k Most Frequently Used Ghost: 1.34% 776.73k CACHE HITS BY DATA TYPE: Demand Data: 58.33% 33.90m Prefetch Data: 12.84% 7.46m Demand Metadata: 22.47% 13.06m Prefetch Metadata: 6.35% 3.69m CACHE MISSES BY DATA TYPE: Demand Data: 16.53% 1.95m Prefetch Data: 82.46% 9.75m Demand Metadata: 0.42% 50.02k Prefetch Metadata: 0.60% 70.50k ------------------------------------------------------------------------ The latest top when it hangs was. i already did shutdown samba and restarted nfs, mountd but these also hangs. 1719 root4200 10000K1300K tx->tx038:030.00% nfsd 1884 root1200 24380K3112K select30:070.00% ntpd 1933 root1260 18500K1860K nanslp00:000.00% cron 1606 root1200 16424K1776K select30:000.00% syslogd 1695 root1200 14292K1828K select10:000.00% nfsuserd 1692 root1200 14292K1828K select30:000.00% nfsuserd 1693 root1200 14292K1828K select00:000.00% nfsuserd 1694 root1200 14292K1828K select20:000.00% nfsuserd 19312 adminusr1200 70184K5524K select10:000.00% sshd 19412 root1200 20940K2536K CPU000:000.00% top 19164 root1200 70184K5440K sbwait00:000.00% sshd 19309 root1210 70184K5440K sbwait00:000.00% sshd 19175 root1200 70184K5440K sbwait00:000.00% sshd 19228 root1200 70184K5440K sbwait00:000.00% sshd 19240 root1200 80784K 12068K zfs30:000.00% smbd 19286 root1200 80784K 12064K zfs00:000.00% smbd 19131 root1200 80784K 12060K zfs30:000.00% smbd 18887 root1200 80784K 12060K zfs10:000.00% smbd 19095 root1200 80784K 12064K zfs00:000.00% smbd 19089 root1200 80784K 12064K zfs10:000.00% smbd 18929 root1200 80784K 12064K zfs00:000.00% smbd 18977 root1210 80784K 12056K zfs10:000.00% smbd 19062 root1200 80784K 12056K zfs10:000.00% smbd 18944 root1200 80784K 12056K zfs 10:000.00% smbd 19063 root1200 80784K 12056K zfs10:000.00% smbd 19231 adminusr1200 70184K5524K select20:000.00% sshd 19317 root1200 21812K3152K wait00:000.00% bash 19178 adminusr1200 70184K5524K select10:000.00% sshd 19236 root1210 21812K3156K wait00:000.00% bash 18924 root1200 80784K 12060K zfs30:000.00% smbd At the topic starter, how do you get the graphs for the arc data? regards Johan Hendriks