From owner-freebsd-fs@FreeBSD.ORG Sun Mar 13 04:54:45 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35]) by hub.freebsd.org (Postfix) with ESMTP id 3E816106564A for ; Sun, 13 Mar 2011 04:54:45 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: from doug-optiplex.ka9q.net (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 91C5514D97D; Sun, 13 Mar 2011 04:54:44 +0000 (UTC) Message-ID: <4D7C4E14.3000105@FreeBSD.org> Date: Sat, 12 Mar 2011 20:54:44 -0800 From: Doug Barton Organization: http://SupersetSolutions.com/ User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110304 Thunderbird/3.1.9 MIME-Version: 1.0 To: Aditya Sarawgi References: <20100929031825.L683@besplex.bde.org> <20100929084801.M948@besplex.bde.org> <20100929041650.GA1553@aditya> <201009290917.05269.jhb@freebsd.org> <20100929202526.GA1564@aditya> <4CD0A3E8.4080304@FreeBSD.org> <4CD201AE.3040409@FreeBSD.org> <20101108174327.GC2066@earth> In-Reply-To: <20101108174327.GC2066@earth> X-Enigmail-Version: 1.1.2 OpenPGP: id=1A1ABC84 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: fs@freebsd.org Subject: Re: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Mar 2011 04:54:45 -0000 On 11/08/2010 09:43, Aditya Sarawgi wrote: > I have attached the patch. Obviously jhb has been doing a lot of work in the ext2fs area, which is well appreciated. However I'm not seeing some of the improvements from bde's patch, Zheng's pre-allocation patch, etc. Are these changes no longer relevant? Doug -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ From owner-freebsd-fs@FreeBSD.ORG Sun Mar 13 06:00:19 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EB7ED1065672 for ; Sun, 13 Mar 2011 06:00:19 +0000 (UTC) (envelope-from gnehzuil@gmail.com) Received: from mail-pv0-f182.google.com (mail-pv0-f182.google.com [74.125.83.182]) by mx1.freebsd.org (Postfix) with ESMTP id 962B88FC13 for ; Sun, 13 Mar 2011 06:00:10 +0000 (UTC) Received: by pvg11 with SMTP id 11so855108pvg.13 for ; Sat, 12 Mar 2011 22:00:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=VrWjjwgeHGyGXHGLUdaIzlPX9JvWaxBN+/Wt3OoQCIc=; b=UXSgs+zg0NLo0s6mmptRyJnsNgI4UwKQf6OXoY6s3dFbgnrE1xOILDke/RbyQMy9Fo L/42+67dArMejEglwbnWV5LOiw0jRGq7hghF7pW8UqQxaDsJHHDHXrdC/59EnkPfvpFm msoE3Anlh4nF4mNuYcDfifXkQnpZCokFAzO+Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; b=X40B981gShs0IRRwLFbTVxmRDhGaIm6FrJrIlynl3JZ2noMB+H08JR8ljooQfXR/CP iuAAvBKOL/J8U92E53ilI94gR3F98WKy0yzMJ34mDCZjPAWZwStqTkCfawfww/snmL+W e2cKbRpWV7k7Wf12OROcMrojgLkpMN6Y4M2Pg= Received: by 10.142.179.1 with SMTP id b1mr6859697wff.432.1299996010279; Sat, 12 Mar 2011 22:00:10 -0800 (PST) Received: from [192.168.1.247] ([166.111.68.197]) by mx.google.com with ESMTPS id 25sm8115530wfb.22.2011.03.12.22.00.06 (version=SSLv3 cipher=OTHER); Sat, 12 Mar 2011 22:00:09 -0800 (PST) Message-ID: <4D7C5D62.9000505@gmail.com> Date: Sun, 13 Mar 2011 14:00:02 +0800 From: gnehzuil User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 To: Doug Barton References: <20100929031825.L683@besplex.bde.org> <20100929084801.M948@besplex.bde.org> <20100929041650.GA1553@aditya> <201009290917.05269.jhb@freebsd.org> <20100929202526.GA1564@aditya> <4CD0A3E8.4080304@FreeBSD.org> <4CD201AE.3040409@FreeBSD.org> <20101108174327.GC2066@earth> <4D7C4E14.3000105@FreeBSD.org> In-Reply-To: <4D7C4E14.3000105@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: fs@freebsd.org Subject: Re: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Mar 2011 06:00:20 -0000 Hi Doug, I have implemented a reallocblk in ext2fs. But it only brings a little improvements in my computer and needs to do more benchmarks. Would you like to do some testings? Thank you. Best regards, lz On 03/13/2011 12:54 PM, Doug Barton wrote: > On 11/08/2010 09:43, Aditya Sarawgi wrote: > >> I have attached the patch. > > Obviously jhb has been doing a lot of work in the ext2fs area, which > is well appreciated. However I'm not seeing some of the improvements > from bde's patch, Zheng's pre-allocation patch, etc. Are these changes > no longer relevant? > > > Doug > From owner-freebsd-fs@FreeBSD.ORG Sun Mar 13 07:08:04 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F2CD6106564A for ; Sun, 13 Mar 2011 07:08:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx10.syd.optusnet.com.au (fallbackmx10.syd.optusnet.com.au [211.29.132.251]) by mx1.freebsd.org (Postfix) with ESMTP id 704738FC1A for ; Sun, 13 Mar 2011 07:08:03 +0000 (UTC) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by fallbackmx10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p2D563WQ024307 for ; Sun, 13 Mar 2011 16:06:03 +1100 Received: from c122-107-125-80.carlnfd1.nsw.optusnet.com.au (c122-107-125-80.carlnfd1.nsw.optusnet.com.au [122.107.125.80]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p2D55wVp005091 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 13 Mar 2011 16:06:00 +1100 Date: Sun, 13 Mar 2011 16:05:58 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Doug Barton In-Reply-To: <4D7C4E14.3000105@FreeBSD.org> Message-ID: <20110313160241.A10133@besplex.bde.org> References: <20100929031825.L683@besplex.bde.org> <20100929084801.M948@besplex.bde.org> <20100929041650.GA1553@aditya> <201009290917.05269.jhb@freebsd.org> <20100929202526.GA1564@aditya> <4CD0A3E8.4080304@FreeBSD.org> <4CD201AE.3040409@FreeBSD.org> <20101108174327.GC2066@earth> <4D7C4E14.3000105@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: fs@freebsd.org Subject: Re: ext2fs now extremely slow X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Mar 2011 07:08:05 -0000 On Sat, 12 Mar 2011, Doug Barton wrote: > On 11/08/2010 09:43, Aditya Sarawgi wrote: > >> I have attached the patch. > > Obviously jhb has been doing a lot of work in the ext2fs area, which is well > appreciated. However I'm not seeing some of the improvements from bde's > patch, Zheng's pre-allocation patch, etc. Are these changes no longer > relevant? jhb committed an improved version of my patch and maybe a cleaned up version of the original. I haven't tested the improved version. Bruce From owner-freebsd-fs@FreeBSD.ORG Sun Mar 13 16:44:25 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5DD591065676 for ; Sun, 13 Mar 2011 16:44:25 +0000 (UTC) (envelope-from tdb@carrick.bishnet.net) Received: from carrick.bishnet.net (carrick.bishnet.net [IPv6:2a01:348:132:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id D99188FC08 for ; Sun, 13 Mar 2011 16:44:24 +0000 (UTC) Received: from [2a01:348:132:51::10] (helo=carrick-users) by carrick.bishnet.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.74 (FreeBSD)) (envelope-from ) id 1PyoP8-0009Y0-O3 for freebsd-fs@freebsd.org; Sun, 13 Mar 2011 16:44:30 +0000 Received: (from tdb@localhost) by carrick-users (8.14.4/8.14.4/Submit) id p2DGiUpK036703 for freebsd-fs@freebsd.org; Sun, 13 Mar 2011 16:44:30 GMT (envelope-from tdb) Date: Sun, 13 Mar 2011 16:44:30 +0000 From: Tim Bishop To: freebsd-fs@freebsd.org Message-ID: <20110313164430.GA977@carrick-users.bishnet.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-PGP-Key: 0x5AE7D984, http://www.bishnet.net/tim/tim-bishnet-net.asc X-PGP-Fingerprint: 1453 086E 9376 1A50 ECF6 AE05 7DCE D659 5AE7 D984 User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: ZFS system unresponsive X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Mar 2011 16:44:25 -0000 On Sun, Feb 27, 2011 at 12:32:17PM +0000, Tim Bishop wrote: > I have a ZFS system that has become unresponsive. It's running amd64 > 8-STABLE as of approximately 20 Dec. It has a UFS-based root file > system and then a ZFS mirror for a handful of jails. > > It seems to get in to this state occasionally, but eventually can > unblock itself. This may take hours though. > > top -HSj shows the following processes active: > > PID JID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 0 0 root -16 0 0K 1456K zio->i 0 28.9H 7.08% {zio_write_issue} > 5 0 root -8 - 0K 60K zio->i 0 776:59 0.29% {txg_thread_enter} > > A procstat on those processes shows: > > 0 100068 kernel zio_write_issue mi_switch sleepq_wait _cv_wait zio_wait dmu_buf_hold_array_by_dnode dmu_read space_map_load metaslab_activate metaslab_alloc zio_dva_allocate zio_execute taskq_run_safe taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline > 5 100094 zfskern txg_thread_enter mi_switch sleepq_wait _cv_wait txg_thread_wait txg_quiesce_thread fork_exit fork_trampoline > 5 100095 zfskern txg_thread_enter mi_switch sleepq_wait _cv_wait zio_wait dsl_pool_sync spa_sync txg_sync_thread fork_exit fork_trampoline > > (I have the full procstat -k output for those PIDs if needed) > > Other processes, such as my hourly zfs snapshots appear to be wedged: > > root 7407 0.0 0.0 14672 1352 ?? D 10:00AM 0:00.46 /sbin/zfs snapshot -r pool0@2011-02-27_10.00.01--1d > root 10184 0.0 0.0 14672 1444 ?? D 11:00AM 0:00.36 /sbin/zfs snapshot -r pool0@2011-02-27_11.00.00--1d > root 12938 0.0 0.0 14672 1516 ?? D 12:00PM 0:00.11 /sbin/zfs snapshot -r pool0@2011-02-27_12.00.01--1d > > PID TID COMM TDNAME KSTACK > 7407 100563 zfs - mi_switch sleepq_wait _cv_wait txg_wait_synced dsl_sync_task_group_wait dmu_objset_snapshot zfs_ioc_snapshot zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscallenter syscall Xfast_syscall > 10184 100707 zfs - mi_switch sleepq_wait _cv_wait txg_wait_synced dsl_sync_task_group_wait dmu_objset_snapshot zfs_ioc_snapshot zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscallenter syscall Xfast_syscall > 12938 100159 zfs - mi_switch sleepq_wait _cv_wait txg_wait_synced dsl_sync_task_group_wait dmu_objset_snapshot zfs_ioc_snapshot zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscallenter syscall Xfast_syscall > > zfs-stats output as follows: > > ------------------------------------------------------------------------ > ZFS Subsystem Report Sun Feb 27 12:20:20 2011 > ------------------------------------------------------------------------ > System Information: > > Kernel Version: 801501 (osreldate) > Hardware Platform: amd64 > Processor Architecture: amd64 > > FreeBSD 8.2-PRERELEASE #3: Mon Dec 20 20:54:55 GMT 2010 tdb > 12:23pm up 68 days, 14:07, 2 users, load averages: 0.35, 0.39, 0.35 > ------------------------------------------------------------------------ > System Memory Statistics: > Physical Memory: 3061.63M > Kernel Memory: 1077.46M > DATA: 99.12% 1067.93M > TEXT: 0.88% 9.53M > ------------------------------------------------------------------------ > ZFS pool information: > Storage pool Version (spa): 15 > Filesystem Version (zpl): 4 > ------------------------------------------------------------------------ > ARC Misc: > Deleted: 148418216 > Recycle Misses: 51095797 > Mutex Misses: 370820 > Evict Skips: 370820 > > ARC Size: > Current Size (arcsize): 55.86% 1087.64M > Target Size (Adaptive, c): 56.50% 1100.22M > Min Size (Hard Limit, c_min): 12.50% 243.40M > Max Size (High Water, c_max): ~8:1 1947.20M > > ARC Size Breakdown: > Recently Used Cache Size (p): 6.25% 68.77M > Freq. Used Cache Size (c-p): 93.75% 1031.45M > > ARC Hash Breakdown: > Elements Max: 398079 > Elements Current: 38.65% 153870 > Collisions: 230805591 > Chain Max: 34 > Chains: 24344 > > ARC Eviction Statistics: > Evicts Total: 4560897494528 > Evicts Eligible for L2: 99.99% 4560573588992 > Evicts Ineligible for L2: 0.01% 323905536 > Evicts Cached to L2: 0 > > ARC Efficiency: > Cache Access Total: 1761824967 > Cache Hit Ratio: 84.82% 1494437389 > Cache Miss Ratio: 15.18% 267387578 > Actual Hit Ratio: 84.82% 1494411236 > > Data Demand Efficiency: 83.35% > > CACHE HITS BY CACHE LIST: > Most Recently Used (mru): 7.86% 117410213 > Most Frequently Used (mfu): 92.14% 1377001023 > MRU Ghost (mru_ghost): 0.63% 9445180 > MFU Ghost (mfu_ghost): 7.99% 119349696 > > CACHE HITS BY DATA TYPE: > Demand Data: 35.75% 534254771 > Prefetch Data: 0.00% 0 > Demand Metadata: 64.25% 960153880 > Prefetch Metadata: 0.00% 28738 > > CACHE MISSES BY DATA TYPE: > Demand Data: 39.91% 106712177 > Prefetch Data: 0.00% 0 > Demand Metadata: 60.01% 160446249 > Prefetch Metadata: 0.09% 229152 > ------------------------------------------------------------------------ > VDEV Cache Summary: > Access Total: 155663083 > Hits Ratio: 70.91% 110387854 > Miss Ratio: 29.09% 45275229 > Delegations: 91183 > ------------------------------------------------------------------------ > ZFS Tunable (sysctl): > kern.maxusers=384 > vfs.zfs.l2c_only_size=0 > vfs.zfs.mfu_ghost_data_lsize=23343104 > vfs.zfs.mfu_ghost_metadata_lsize=302204928 > vfs.zfs.mfu_ghost_size=325548032 > vfs.zfs.mfu_data_lsize=524091904 > vfs.zfs.mfu_metadata_lsize=52224 > vfs.zfs.mfu_size=533595136 > vfs.zfs.mru_ghost_data_lsize=30208 > vfs.zfs.mru_ghost_metadata_lsize=727952896 > vfs.zfs.mru_ghost_size=727983104 > vfs.zfs.mru_data_lsize=100169216 > vfs.zfs.mru_metadata_lsize=0 > vfs.zfs.mru_size=339522048 > vfs.zfs.anon_data_lsize=0 > vfs.zfs.anon_metadata_lsize=0 > vfs.zfs.anon_size=10959360 > vfs.zfs.l2arc_norw=1 > vfs.zfs.l2arc_feed_again=1 > vfs.zfs.l2arc_noprefetch=0 > vfs.zfs.l2arc_feed_min_ms=200 > vfs.zfs.l2arc_feed_secs=1 > vfs.zfs.l2arc_headroom=2 > vfs.zfs.l2arc_write_boost=8388608 > vfs.zfs.l2arc_write_max=8388608 > vfs.zfs.arc_meta_limit=510447616 > vfs.zfs.arc_meta_used=513363680 > vfs.zfs.mdcomp_disable=0 > vfs.zfs.arc_min=255223808 > vfs.zfs.arc_max=2041790464 > vfs.zfs.zfetch.array_rd_sz=1048576 > vfs.zfs.zfetch.block_cap=256 > vfs.zfs.zfetch.min_sec_reap=2 > vfs.zfs.zfetch.max_streams=8 > vfs.zfs.prefetch_disable=1 > vfs.zfs.check_hostid=1 > vfs.zfs.recover=0 > vfs.zfs.txg.write_limit_override=0 > vfs.zfs.txg.synctime=5 > vfs.zfs.txg.timeout=30 > vfs.zfs.scrub_limit=10 > vfs.zfs.vdev.cache.bshift=16 > vfs.zfs.vdev.cache.size=10485760 > vfs.zfs.vdev.cache.max=16384 > vfs.zfs.vdev.aggregation_limit=131072 > vfs.zfs.vdev.ramp_rate=2 > vfs.zfs.vdev.time_shift=6 > vfs.zfs.vdev.min_pending=4 > vfs.zfs.vdev.max_pending=10 > vfs.zfs.cache_flush_disable=0 > vfs.zfs.zil_disable=0 > vfs.zfs.zio.use_uma=0 > vfs.zfs.version.zpl=4 > vfs.zfs.version.spa=15 > vfs.zfs.version.dmu_backup_stream=1 > vfs.zfs.version.dmu_backup_header=2 > vfs.zfs.version.acl=1 > vfs.zfs.debug=0 > vfs.zfs.super_owner=0 > vm.kmem_size=3115532288 > vm.kmem_size_scale=1 > vm.kmem_size_min=0 > vm.kmem_size_max=329853485875 > ------------------------------------------------------------------------ > > I hope somebody can give me some pointers on where to go with this. > > I'm just about to reboot (when it unwedges) and upgrade to the latest > 8-STABLE to see if that helps. I did the upgrade to 8-STABLE and it didn't help. I'm still seeing the same issue. Someone else in another thread mentioned about graphing the zfs-stats output with munin, so I thought I'd give that a whirl. Here are the results: (static snapshot of data so it doesn't change between when I write this and when somebody reads it) http://www.bishnet.net/tim/tmp/munin/carrick/carrick/index.html This slow down is triggered by the tarsnap backups I run each day. They start about 01:15 (in the same timezone as the graphs) and finished at around 11:15 after I poked it. It slows down when processing my Maildir folders, but doesn't actually stop - it's processing a single file every couple of seconds. Here's a graph showing the disk I/O: http://www.bishnet.net/tim/tmp/statgrab/disk.ad4.read_bytes-day.png That flat period from about 06:30 until 11:00 is where it appears to get stuck. Disk I/O stays constant as can be seen in the graph. Killing the tarsnap process unjams it which lets it going on to backing up the next tree of files (I run a handful of tarsnap processes one after another). The zpool iostat looks something like this during that period: capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- pool0 111G 23.2G 248 114 865K 269K mirror 111G 23.2G 248 114 865K 269K ad4s3 - - 43 56 2.47M 269K ad6s3 - - 39 56 2.41M 269K ---------- ----- ----- ----- ----- ----- ----- So it's showing more I/O on the vdevs than on the actual pool. This tied up with my previous findings that the ZFS kernel processes are using a lot of CPU time suggests to me that ZFS is doing stuff internally which is slowing down all external operations on the filesystem. In particular, I find this graph quite interesting: http://www.bishnet.net/tim/tmp/munin/carrick/carrick/zfs_arc_utilization.html The ARC usage is jumping around all over the place while the backups are running. Thanks for taking the time to read this, and I appreciate any input. Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 From owner-freebsd-fs@FreeBSD.ORG Sun Mar 13 16:46:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 59AC61065676 for ; Sun, 13 Mar 2011 16:46:06 +0000 (UTC) (envelope-from tdb@carrick.bishnet.net) Received: from carrick.bishnet.net (carrick.bishnet.net [IPv6:2a01:348:132:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 183E48FC19 for ; Sun, 13 Mar 2011 16:46:06 +0000 (UTC) Received: from [2a01:348:132:51::10] (helo=carrick-users) by carrick.bishnet.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.74 (FreeBSD)) (envelope-from ) id 1PyoQm-0009a5-0s; Sun, 13 Mar 2011 16:46:12 +0000 Received: (from tdb@localhost) by carrick-users (8.14.4/8.14.4/Submit) id p2DGkBPJ036832; Sun, 13 Mar 2011 16:46:11 GMT (envelope-from tdb) Date: Sun, 13 Mar 2011 16:46:11 +0000 From: Tim Bishop To: "Vladislav V. Prodan" Message-ID: <20110313164611.GB977@carrick-users.bishnet.net> References: <4D6ABD53.5090709@ukr.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4D6ABD53.5090709@ukr.net> X-PGP-Key: 0x5AE7D984, http://www.bishnet.net/tim/tim-bishnet-net.asc X-PGP-Fingerprint: 1453 086E 9376 1A50 ECF6 AE05 7DCE D659 5AE7 D984 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS system unresponsive X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Mar 2011 16:46:06 -0000 Hi, On Sun, Feb 27, 2011 at 11:08:35PM +0200, Vladislav V. Prodan wrote: > 27.02.2011 14:32, Tim Bishop wrote: > > I have a ZFS system that has become unresponsive. It's running amd64 > > 8-STABLE as of approximately 20 Dec. It has a UFS-based root file > > system and then a ZFS mirror for a handful of jails. > > > > It seems to get in to this state occasionally, but eventually can > > unblock itself. This may take hours though. > > In such cases, only helps the reset button. > I already wrote about this bug - kern/153351 > I have one machine updated to 8.2-PRERELEASE # 0: Fri Feb 18 18:27:12 > EET 2011 bug does not pop up yet ... I'm not sure this is the same issue as I'm seeing. Even when the system is running slowly things like ls still work, they just take a while. Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 From owner-freebsd-fs@FreeBSD.ORG Sun Mar 13 18:08:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5B84610656E7 for ; Sun, 13 Mar 2011 18:08:06 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta13.westchester.pa.mail.comcast.net (qmta13.westchester.pa.mail.comcast.net [76.96.59.243]) by mx1.freebsd.org (Postfix) with ESMTP id 061D08FC0A for ; Sun, 13 Mar 2011 18:08:05 +0000 (UTC) Received: from omta03.westchester.pa.mail.comcast.net ([76.96.62.27]) by qmta13.westchester.pa.mail.comcast.net with comcast id Ji861g0030bG4ec5Di86La; Sun, 13 Mar 2011 18:08:06 +0000 Received: from koitsu.dyndns.org ([98.248.33.18]) by omta03.westchester.pa.mail.comcast.net with comcast id Ji801g01V0PUQVN3Pi82X2; Sun, 13 Mar 2011 18:08:05 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 40C679B422; Sun, 13 Mar 2011 11:07:59 -0700 (PDT) Date: Sun, 13 Mar 2011 11:07:59 -0700 From: Jeremy Chadwick To: Tim Bishop Message-ID: <20110313180759.GA3266@icarus.home.lan> References: <20110313164430.GA977@carrick-users.bishnet.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110313164430.GA977@carrick-users.bishnet.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS system unresponsive X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Mar 2011 18:08:06 -0000 I'm snipping the statistics that instead people can read via the mailing list archive URL: http://lists.freebsd.org/pipermail/freebsd-fs/2011-February/010796.html On Sun, Mar 13, 2011 at 04:44:30PM +0000, Tim Bishop wrote: > I did the upgrade to 8-STABLE and it didn't help. I'm still seeing the > same issue. > > Someone else in another thread mentioned about graphing the zfs-stats > output with munin, so I thought I'd give that a whirl. Here are the > results: > > (static snapshot of data so it doesn't change between when I write this > and when somebody reads it) > > http://www.bishnet.net/tim/tmp/munin/carrick/carrick/index.html > > This slow down is triggered by the tarsnap backups I run each day. They > start about 01:15 (in the same timezone as the graphs) and finished at > around 11:15 after I poked it. It slows down when processing my Maildir > folders, but doesn't actually stop - it's processing a single file every > couple of seconds. The below may be informational to you, or at least give you something to ponder. It may not be the problem you're experiencing, so I don't want to divert you from your efforts. Please keep that in mind when reading the below. I've ranted and raved in the past about how badly Maildir performs under ZFS (or NFS for that matter). This is one of the reasons I switched back to using classic mail spools. I'm well-aware of the problems with them, but Maildir is absurdly painful (I often describe it as "rude"). It performs awfully at my workplace when used by an NFS-backed filesystem (using a very high-end NetApp filer) with Solaris 10 as a client, and also performs horribly on a smaller (10-15 user) Solaris 10 server (backed by ZFS filesystems) using Dovecot for IMAP and Maildir. Focusing on FreeBSD, I've written about the abysmal performance here: http://koitsu.wordpress.com/2009/06/01/freebsd-and-zfs-horrible-raidz1-read-speed/ http://koitsu.wordpress.com/2009/06/01/freebsd-and-zfs-horrible-raidz1-speed-part-2/ http://koitsu.wordpress.com/2009/06/14/freebsd-and-zfs-horrible-raidz1-speed-finale/ http://koitsu.wordpress.com/2009/06/17/freebsd-and-zfs-more-performance-quirks/ Be aware the last blog post may have been addressed since its creation; that is to say, things in ZFS on FreeBSD have changed. However, the Maildir performance problem almost certainly hasn't, and to be honest I don't think a filesystem should be catering to a mail format. The only real solution is a new format called MIX. After reading the below you'll probably get get quite excited, but become depressed when you hear that basically nobody has bothered to implement support for it (especially mutt, which saddens me greatly): http://en.wikipedia.org/wiki/MIX_%28Email%29 All this said, if you can get rid of Maildir (convert things back to classic spools) as a form of testing and see how things perform after that, I would strongly recommend it. I imagine there is probably something that can help with the Maildir performance issue under ZFS, such as making use of a "cache" device (which makes use of L2ARC; don't confuse this with ARC). Note that "cache" devices on pools need to be very fast for reading; SSDs tend to work beautifully for this. But IMHO, the cost of an SSD just to "attempt to get good performance with Maildir" isn't worth it. Gotta think about it in the long run... -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Mar 13 18:26:14 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 936D0106564A for ; Sun, 13 Mar 2011 18:26:14 +0000 (UTC) (envelope-from tdb@carrick.bishnet.net) Received: from carrick.bishnet.net (carrick.bishnet.net [IPv6:2a01:348:132:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 4DABF8FC17 for ; Sun, 13 Mar 2011 18:26:14 +0000 (UTC) Received: from [2a01:348:132:51::10] (helo=carrick-users) by carrick.bishnet.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.74 (FreeBSD)) (envelope-from ) id 1Pypzg-000Ht1-1z; Sun, 13 Mar 2011 18:26:20 +0000 Received: (from tdb@localhost) by carrick-users (8.14.4/8.14.4/Submit) id p2DIQJRA068758; Sun, 13 Mar 2011 18:26:19 GMT (envelope-from tdb) Date: Sun, 13 Mar 2011 18:26:19 +0000 From: Tim Bishop To: Jeremy Chadwick Message-ID: <20110313182619.GC977@carrick-users.bishnet.net> References: <20110313164430.GA977@carrick-users.bishnet.net> <20110313180759.GA3266@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110313180759.GA3266@icarus.home.lan> X-PGP-Key: 0x5AE7D984, http://www.bishnet.net/tim/tim-bishnet-net.asc X-PGP-Fingerprint: 1453 086E 9376 1A50 ECF6 AE05 7DCE D659 5AE7 D984 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS system unresponsive X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Mar 2011 18:26:14 -0000 Hi Jeremy, On Sun, Mar 13, 2011 at 11:07:59AM -0700, Jeremy Chadwick wrote: > I'm snipping the statistics that instead people can read via the mailing > list archive URL: > > http://lists.freebsd.org/pipermail/freebsd-fs/2011-February/010796.html > > On Sun, Mar 13, 2011 at 04:44:30PM +0000, Tim Bishop wrote: > > I did the upgrade to 8-STABLE and it didn't help. I'm still seeing the > > same issue. > > > > Someone else in another thread mentioned about graphing the zfs-stats > > output with munin, so I thought I'd give that a whirl. Here are the > > results: > > > > (static snapshot of data so it doesn't change between when I write this > > and when somebody reads it) > > > > http://www.bishnet.net/tim/tmp/munin/carrick/carrick/index.html > > > > This slow down is triggered by the tarsnap backups I run each day. They > > start about 01:15 (in the same timezone as the graphs) and finished at > > around 11:15 after I poked it. It slows down when processing my Maildir > > folders, but doesn't actually stop - it's processing a single file every > > couple of seconds. > > The below may be informational to you, or at least give you something to > ponder. It may not be the problem you're experiencing, so I don't want > to divert you from your efforts. Please keep that in mind when reading > the below. [snip ZFS and Maildir notes] Thanks for your reply. I'll check those details over to see if there's anything of relevance. But I just wanted to point out that this doesn't happen all the time. It happens maybe 1 night in 5 when backing up. So it doesn't seem like it's an inherent issue with Maildir on ZFS. But yes, I do agree Maildir on ZFS is poor. I also have a Maildir+Solaris+NFS setup at work :-) Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 From owner-freebsd-fs@FreeBSD.ORG Sun Mar 13 19:33:47 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 364A21065670 for ; Sun, 13 Mar 2011 19:33:47 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id B81458FC14 for ; Sun, 13 Mar 2011 19:33:46 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 3FEF245C9B; Sun, 13 Mar 2011 20:33:43 +0100 (CET) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id AFA6445C8A; Sun, 13 Mar 2011 20:33:28 +0100 (CET) Date: Sun, 13 Mar 2011 20:33:20 +0100 From: Pawel Jakub Dawidek To: Jilles Tjoelker Message-ID: <20110313193320.GC40734@garage.freebsd.pl> References: <20110312170123.GT78089@deviant.kiev.zoral.com.ua> <20110312193131.GA97300@stack.nl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ncSAzJYg3Aa9+CRW" Content-Disposition: inline In-Reply-To: <20110312193131.GA97300@stack.nl> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT amd64 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: freebsd-fs@freebsd.org, freebsd-standards@freebsd.org Subject: Re: open(O_NOFOLLOW) error when encountered symlink X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Mar 2011 19:33:47 -0000 --ncSAzJYg3Aa9+CRW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Mar 12, 2011 at 08:31:32PM +0100, Jilles Tjoelker wrote: > On Sat, Mar 12, 2011 at 07:01:23PM +0200, Kostik Belousov wrote: > > Hello, > > I noted the following discussion and commits in the gnu tar repository: >=20 > > http://lists.gnu.org/archive/html/bug-tar/2010-11/msg00080.html > >=20 > > http://git.savannah.gnu.org/cgit/tar.git/commit/?id=3D1584b72ff271e7f82= 6dd64d7a1c7cd2f66504acb > > http://git.savannah.gnu.org/cgit/tar.git/commit/?id=3D649b747913d2b289e= 904b5f1d222af886acd209c >=20 > > The issue is that in case of open(path, O_NOFOLLOW), when path is naming > > a symlink, FreeBSD returns EMLINK error. On the other hand, the POSIX > > requirement is absolutely clear that it shall be ELOOP. >=20 > > I found FreeBSD commit r35088 that specifically changed the error code > > from the required ELOOP to EMLINK. I doubt that somebody can remember > > a reason for the change done more then 12 years ago. >=20 > In fact that change was done hours after the new ELOOP error. I don't think that POSIX knew about O_NOFOLLOW at that time, so peter@ properly predicted ELOOP, but some evil creature convinced him to change it to EMLINK. This is from 2004: http://pubs.opengroup.org/onlinepubs/009695399/functions/open.html and not a word about O_NOFOLLOW. PS. I'm voting with both hands to change it to ELOOP. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --ncSAzJYg3Aa9+CRW Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk19G/IACgkQForvXbEpPzStAACgtodD99wJDrlF1HMRGnKG5QF1 T5oAn3pkoXgktTYsek/JXbrd1ne5L65/ =Nkcw -----END PGP SIGNATURE----- --ncSAzJYg3Aa9+CRW-- From owner-freebsd-fs@FreeBSD.ORG Sun Mar 13 19:40:33 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 24642106564A for ; Sun, 13 Mar 2011 19:40:33 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id C3E4A8FC0C for ; Sun, 13 Mar 2011 19:40:32 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 6AFB045CA0; Sun, 13 Mar 2011 20:40:31 +0100 (CET) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 7747145C9F; Sun, 13 Mar 2011 20:40:26 +0100 (CET) Date: Sun, 13 Mar 2011 20:40:19 +0100 From: Pawel Jakub Dawidek To: Iurii Radomskyi Message-ID: <20110313194019.GD40734@garage.freebsd.pl> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Hf61M2y+wYpnELGG" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT amd64 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: freebsd-fs@FreeBSD.org Subject: Re: HAST wiki outdated X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Mar 2011 19:40:33 -0000 --Hf61M2y+wYpnELGG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 10, 2011 at 05:38:05PM +0200, Iurii Radomskyi wrote: > Hello. >=20 > In this wiki article: http://wiki.freebsd.org/HAST there is info on > "Replication modes". It is outdatged comparing to the man hast.conf of > 8.2 release in the following parts: > 1. last sentense of memsync > 2. last sentense of fullsync >=20 > maybe something else, not sure ) >=20 > if you reply, please cc as i am not a member of the list. Thanks, should be fixed already. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --Hf61M2y+wYpnELGG Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk19HaMACgkQForvXbEpPzQ2EgCfYXaURxIch1VrcMGFLibwTOG7 a9oAoNQqB5r4hRNW4u8Oh2pYxxXCW4TM =aQA8 -----END PGP SIGNATURE----- --Hf61M2y+wYpnELGG-- From owner-freebsd-fs@FreeBSD.ORG Mon Mar 14 11:07:00 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4CCB01065675 for ; Mon, 14 Mar 2011 11:07:00 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 182648FC1A for ; Mon, 14 Mar 2011 11:07:00 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p2EB6xHq002554 for ; Mon, 14 Mar 2011 11:06:59 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p2EB6xha002552 for freebsd-fs@FreeBSD.org; Mon, 14 Mar 2011 11:06:59 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 14 Mar 2011 11:06:59 GMT Message-Id: <201103141106.p2EB6xha002552@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Mar 2011 11:07:00 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/155484 fs [ufs] GPT + UFS boot don't work well together o kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 o kern/154447 fs [zfs] [panic] Occasional panics - solaris assert somew f kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153847 fs [nfs] [panic] Kernel panic from incorrect m_free in nf o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153552 fs [zfs] zfsboot from 8.2-RC1 freeze at boot time o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small p kern/152488 fs [tmpfs] [patch] mtime of file updated when only inode o kern/152079 fs [msdosfs] [patch] Small cleanups from the other NetBSD o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o kern/151845 fs [smbfs] [patch] smbfs should be upgraded to support Un o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa f kern/149022 fs [hang] File system operations hangs with suspfs state o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142401 fs [ntfs] [patch] Minor updates to NTFS from NetBSD o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140134 fs [msdosfs] write and fsck destroy filesystem integrity o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [ffs] [snapshot] System crashes when manipulat o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 219 problems total. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 15 12:17:23 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C4DE2106564A; Tue, 15 Mar 2011 12:17:23 +0000 (UTC) (envelope-from gnehzuil@gmail.com) Received: from mail-pv0-f182.google.com (mail-pv0-f182.google.com [74.125.83.182]) by mx1.freebsd.org (Postfix) with ESMTP id 8C1B08FC14; Tue, 15 Mar 2011 12:17:23 +0000 (UTC) Received: by pvg11 with SMTP id 11so107798pvg.13 for ; Tue, 15 Mar 2011 05:17:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:message-id:date:from:user-agent:mime-version:to :cc:subject:content-type; bh=Sgn/LWEqBhHTIPSDq6xbvAUm9SSFANDTtAXLniCM1zA=; b=X/m5lRMpH4ZPBQnH39ZB/NiRlf+6R/xdK0mAmpXxVx8D6koUg018i2CECPgbudKhNh Wz9rdgKjbgmB/PIgjj+BkEbx9PlY+9ZA7oFBZOGQfA4drbKqVavjIIgUfBrB6Q6CafDp lMX62iM9vcGpFwa+gM+FiVRPtVVyvDDhZbfIU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :content-type; b=Ikz3b/fWAwgpLfvVCkfY5Wew3jFHRyOSFj+Nd13wEew3aWDNYCJf/90RI+1zPkbTYN JfLMQDahbhyiHSmw1sjNc0Trhim1a7XESQyentXtL19FRlz1BwfDxW/rpcn0RLTYDxJB slV2756dDR8b5icJ3gELrfHv87taDLfjnwRqA= Received: by 10.142.150.16 with SMTP id x16mr11332605wfd.173.1300191443092; Tue, 15 Mar 2011 05:17:23 -0700 (PDT) Received: from [192.168.1.247] ([166.111.68.197]) by mx.google.com with ESMTPS id x11sm5024259wfd.13.2011.03.15.05.17.17 (version=SSLv3 cipher=OTHER); Tue, 15 Mar 2011 05:17:21 -0700 (PDT) Message-ID: <4D7F58C7.4060402@gmail.com> Date: Tue, 15 Mar 2011 20:17:11 +0800 From: gnehzuil User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: multipart/mixed; boundary="------------040509040101080104050601" Cc: "Pedro F. Giffuni" Subject: [ext2fs][patch] reallocblks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2011 12:17:23 -0000 This is a multi-part message in MIME format. --------------040509040101080104050601 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi there, I have implemented reallocblks in ext2fs. I added some structures in m_ext2fs to record cluster summary information due to group descriptor has not a structure to record these data in disk. So I implemented it in memory. The implementation is almost the same to in ffs. I have done some simple benchmarks with dbench. This patch can improve the performance a little. Best regards, lz --------------040509040101080104050601 Content-Type: text/x-patch; name="patch-realloc.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="patch-realloc.diff" diff -u ext2fs/ext2_alloc.c ext2fs_realloc/ext2_alloc.c --- ext2fs/ext2_alloc.c 2011-03-15 19:57:35.000000000 +0000 +++ ext2fs_realloc/ext2_alloc.c 2011-03-15 19:53:56.000000000 +0000 @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -52,6 +53,7 @@ #include static daddr_t ext2_alloccg(struct inode *, int, daddr_t, int); +static daddr_t ext2_clusteralloc(struct inode *, int, daddr_t, int); static u_long ext2_dirpref(struct inode *); static void ext2_fserr(struct m_ext2fs *, uid_t, char *); static u_long ext2_hashalloc(struct inode *, int, long, int, @@ -59,9 +61,6 @@ int)); static daddr_t ext2_nodealloccg(struct inode *, int, daddr_t, int); static daddr_t ext2_mapsearch(struct m_ext2fs *, char *, daddr_t); -#ifdef FANCY_REALLOC -static int ext2_reallocblks(struct vop_reallocblks_args *); -#endif /* * Allocate a block in the file system. @@ -150,7 +149,6 @@ * the previous block allocation will be used. */ -#ifdef FANCY_REALLOC SYSCTL_NODE(_vfs, OID_AUTO, ext2fs, CTLFLAG_RW, 0, "EXT2FS filesystem"); static int doasyncfree = 1; @@ -159,7 +157,6 @@ static int doreallocblks = 1; SYSCTL_INT(_vfs_ext2fs, OID_AUTO, doreallocblks, CTLFLAG_RW, &doreallocblks, 0, ""); -#endif int ext2_reallocblks(ap) @@ -168,11 +165,6 @@ struct cluster_save *a_buflist; } */ *ap; { -#ifndef FANCY_REALLOC -/* printf("ext2_reallocblks not implemented\n"); */ -return ENOSPC; -#else - struct m_ext2fs *fs; struct inode *ip; struct vnode *vp; @@ -181,17 +173,19 @@ struct ext2mount *ump; struct cluster_save *buflist; struct indir start_ap[NIADDR + 1], end_ap[NIADDR + 1], *idp; - int32_t start_lbn, end_lbn, soff, newblk, blkno =0; + int32_t start_lbn, end_lbn, soff, newblk, blkno; int i, len, start_lvl, end_lvl, pref, ssize; vp = ap->a_vp; ip = VTOI(vp); fs = ip->i_e2fs; ump = ip->i_ump; -#ifdef UNKLAR - if (fs->fs_contigsumsize <= 0) + + if (doreallocblks == 0) return (ENOSPC); -#endif + if (fs->e2fs_contigsumsize <= 0) + return (ENOSPC); + buflist = ap->a_buflist; len = buflist->bs_nchildren; start_lbn = buflist->bs_children[0]->b_lblkno; @@ -228,11 +222,6 @@ soff = idp->in_off; } /* - * Find the preferred location for the cluster. - */ - EXT2_LOCK(ump); - pref = ext2_blkpref(ip, start_lbn, soff, sbap, blkno); - /* * If the block range spans two block maps, get the second map. */ if (end_lvl == 0 || (idp = &end_ap[end_lvl - 1])->in_off + 1 >= len) { @@ -243,13 +232,16 @@ panic("ext2_reallocblk: start == end"); #endif ssize = len - (idp->in_off + 1); - if (bread(vp, idp->in_lbn, (int)fs->e2fs_bsize, NOCRED, &ebp)){ - EXT2_UNLOCK(ump); + if (bread(vp, idp->in_lbn, (int)fs->e2fs_bsize, NOCRED, &ebp)) goto fail; - } ebap = (int32_t *)ebp->b_data; } /* + * Find the preferred location for the cluster. + */ + EXT2_LOCK(ump); + pref = ext2_blkpref(ip, start_lbn, soff, sbap, 0); + /* * Search the block map looking for an allocation of the desired size. */ if ((newblk = (int32_t)ext2_hashalloc(ip, dtog(fs, pref), pref, @@ -264,15 +256,23 @@ * block pointers in the inode and indirect blocks associated * with the file. */ +#ifdef DEBUG + printf("realloc: ino %d, lbns %jd-%jd\n\told:", ip->i_number, + (intmax_t)start_lbn, (intmax_t)end_lbn); +#endif /* DEBUG */ blkno = newblk; for (bap = &sbap[soff], i = 0; i < len; i++, blkno += fs->e2fs_fpb) { - if (i == ssize) + if (i == ssize) { bap = ebap; soff = -i; + } #ifdef DIAGNOSTIC if (buflist->bs_children[i]->b_blkno != fsbtodb(fs, *bap)) panic("ext2_reallocblks: alloc mismatch"); #endif +#ifdef DEBUG + printf(" %d,", *bap); +#endif /* DEBUG */ *bap++ = blkno; } /* @@ -308,11 +308,20 @@ /* * Last, free the old blocks and assign the new blocks to the buffers. */ +#ifdef DEBUG + printf("\n\tnew:"); +#endif /* DEBUG */ for (blkno = newblk, i = 0; i < len; i++, blkno += fs->e2fs_fpb) { ext2_blkfree(ip, dbtofsb(fs, buflist->bs_children[i]->b_blkno), fs->e2fs_bsize); buflist->bs_children[i]->b_blkno = fsbtodb(fs, blkno); - } +#ifdef DEBUG + printf(" %d,", blkno); +#endif /* DEBUG */ + } +#ifdef DEBUG + printf("\n"); +#endif /* DEBUG */ return (0); fail: @@ -321,8 +330,6 @@ if (sbap != &ip->i_db[0]) brelse(sbp); return (ENOSPC); - -#endif /* FANCY_REALLOC */ } /* @@ -747,6 +754,7 @@ #endif setbit(bbp, bno); EXT2_LOCK(ump); + ext2_clusteracct(fs, bbp, cg, bno, -1); fs->e2fs->e2fs_fbcount--; fs->e2fs_gd[cg].ext2bgd_nbfree--; fs->e2fs_fmod = 1; @@ -755,6 +763,113 @@ return (cg * fs->e2fs->e2fs_fpg + fs->e2fs->e2fs_first_dblock + bno); } +static daddr_t +ext2_clusteralloc(struct inode *ip, int cg, daddr_t bpref, int len) +{ + struct m_ext2fs *fs; + struct ext2mount *ump; + struct buf *bp; + char *bbp; + int bit, error, got, i, loc, run; + int32_t *lp; + daddr_t bno; + + fs = ip->i_e2fs; + ump = ip->i_ump; + + if (fs->e2fs_maxcluster[cg] < len) + return (0); + + EXT2_UNLOCK(ump); + error = bread(ip->i_devvp, + fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), + (int)fs->e2fs_bsize, NOCRED, &bp); + if (error) + goto fail_lock; + + bbp = (char *)bp->b_data; + bp->b_xflags |= BX_BKGRDWRITE; + + EXT2_LOCK(ump); + /* + * Check to see if a cluster of the needed size (or bigger) is + * available in this cylinder group. + */ + lp = &fs->e2fs_clustersum[cg].cs_sum[len]; + for (i = len; i <= fs->e2fs_contigsumsize; i++) + if (*lp++ > 0) + break; + if (i > fs->e2fs_contigsumsize) { + /* + * Update the cluster summary information to reflect + * the true maximum sized cluster so that future cluster + * allocation requests can avoid reading the bitmap only + * to find no cluster. + */ + lp = &fs->e2fs_clustersum[cg].cs_sum[len - 1]; + for (i = len - 1; i > 0; i--) + if (*lp-- > 0) + break; + fs->e2fs_maxcluster[cg] = i; + goto fail; + } + EXT2_UNLOCK(ump); + + /* Search the bitmap to find a bit enough cluster like ffs. */ + if (dtog(fs, bpref) != cg) + bpref = 0; + if (bpref != 0) + bpref = dtogd(fs, bpref); + loc = bpref / NBBY; + bit = 1 << (bpref % NBBY); + for (run = 0, got = bpref; got < fs->e2fs->e2fs_fpg; got++) { + if ((bbp[loc] & bit) != 0) + run = 0; + else { + run++; + if (run == len) + break; + } + if ((got & (NBBY - 1)) != (NBBY - 1)) + bit <<= 1; + else { + loc++; + bit = 1; + } + } + + if (got >= fs->e2fs->e2fs_fpg) + goto fail_lock; + + /* Allocate the cluster that we have found. */ + for (i = 1; i < len; i++) + if (!isclr(bbp, got - run + i)) + panic("ext2_clusteralloc: map mismatch"); + + bno = got - run + 1; + if (bno >= fs->e2fs->e2fs_fpg) + panic("ext2_clusteralloc: allocated out of group"); + + EXT2_LOCK(ump); + for (i = 0; i < len; i += fs->e2fs_fpb) { + setbit(bbp, bno + i); + ext2_clusteracct(fs, bbp, cg, bno + i, -1); + fs->e2fs->e2fs_fbcount--; + fs->e2fs_gd[cg].ext2bgd_nbfree--; + } + fs->e2fs_fmod = 1; + EXT2_UNLOCK(ump); + + bdwrite(bp); + return (cg * fs->e2fs->e2fs_fpg + fs->e2fs->e2fs_first_dblock + bno); + +fail_lock: + EXT2_LOCK(ump); +fail: + brelse(bp); + return (0); +} + /* * Determine whether an inode can be allocated. * @@ -877,6 +992,7 @@ } clrbit(bbp, bno); EXT2_LOCK(ump); + ext2_clusteracct(fs, bbp, cg, bno, 1); fs->e2fs->e2fs_fbcount++; fs->e2fs_gd[cg].ext2bgd_nbfree++; fs->e2fs_fmod = 1; diff -u ext2fs/ext2_extern.h ext2fs_realloc/ext2_extern.h --- ext2fs/ext2_extern.h 2011-03-15 19:57:35.000000000 +0000 +++ ext2fs_realloc/ext2_extern.h 2011-03-15 19:53:56.000000000 +0000 @@ -55,6 +55,7 @@ int32_t ext2_blkpref(struct inode *, int32_t, int, int32_t *, int32_t); int ext2_bmap(struct vop_bmap_args *); int ext2_bmaparray(struct vnode *, int32_t, int32_t *, int *, int *); +void ext2_clusteracct(struct m_ext2fs *, char *, int, daddr_t, int); void ext2_dirbad(struct inode *ip, doff_t offset, char *how); void ext2_ei2i(struct ext2fs_dinode *, struct inode *); int ext2_getlbns(struct vnode *, int32_t, struct indir *, int *); diff -u ext2fs/ext2_subr.c ext2fs_realloc/ext2_subr.c --- ext2fs/ext2_subr.c 2011-03-15 19:57:35.000000000 +0000 +++ ext2fs_realloc/ext2_subr.c 2011-03-15 19:53:56.000000000 +0000 @@ -120,3 +120,107 @@ } } #endif /* KDB */ + +/* + * Update the cluster map because of an allocation of free like ffs. + * + * Cnt == 1 means free; cnt == -1 means allocating. + */ +void +ext2_clusteracct(struct m_ext2fs *fs, char *bbp, int cg, daddr_t bno, int cnt) +{ + int32_t *sump = fs->e2fs_clustersum[cg].cs_sum; + int32_t *lp; + int back, bit, end, forw, i, loc, start; + + /* Initialize the cluster summary array. */ + if (fs->e2fs_clustersum[cg].cs_init == 0) { + int run = 0; + bit = 1; + loc = 0; + + for (i = 0; i < fs->e2fs->e2fs_fpg; i++) { + if ((bbp[loc] & bit) == 0) + run++; + else if (run != 0) { + if (run > fs->e2fs_contigsumsize) + run = fs->e2fs_contigsumsize; + sump[run]++; + run = 0; + } + if ((i & (NBBY - 1)) != (NBBY - 1)) + bit <<= 1; + else { + loc++; + bit = 1; + } + } + if (run != 0) { + if (run > fs->e2fs_contigsumsize) + run = fs->e2fs_contigsumsize; + sump[run]++; + } + fs->e2fs_clustersum[cg].cs_init = 1; + } + + if (fs->e2fs_contigsumsize <= 0) + return; + + /* Find the size of the cluster going forward. */ + start = bno + 1; + end = start + fs->e2fs_contigsumsize; + if (end > fs->e2fs->e2fs_fpg) + end = fs->e2fs->e2fs_fpg; + loc = start / NBBY; + bit = 1 << (start % NBBY); + for (i = start; i < end; i++) { + if ((bbp[loc] & bit) != 0) + break; + if ((i & (NBBY - 1)) != (NBBY - 1)) + bit <<= 1; + else { + loc++; + bit = 1; + } + } + forw = i - start; + + /* Find the size of the cluster going backward. */ + start = bno - 1; + end = start - fs->e2fs_contigsumsize; + if (end < 0) + end = -1; + loc = start / NBBY; + bit = 1 << (start % NBBY); + for (i = start; i > end; i--) { + if ((bbp[loc] & bit) != 0) + break; + if ((i & (NBBY - 1)) != 0) + bit >>= 1; + else { + loc--; + bit = 1 << (NBBY - 1); + } + } + back = start - i; + + /* + * Account for old cluster and the possibly new forward and + * back clusters. + */ + i = back + forw + 1; + if (i > fs->e2fs_contigsumsize) + i = fs->e2fs_contigsumsize; + sump[i] += cnt; + if (back > 0) + sump[back] -= cnt; + if (forw > 0) + sump[forw] -= cnt; + + /* Update cluster summary information. */ + lp = &sump[fs->e2fs_contigsumsize]; + for (i = fs->e2fs_contigsumsize; i > 0; i--) + if (*lp-- > 0) + break; + fs->e2fs_maxcluster[cg] = i; +} diff -u ext2fs/ext2_vfsops.c ext2fs_realloc/ext2_vfsops.c --- ext2fs/ext2_vfsops.c 2011-03-15 19:57:35.000000000 +0000 +++ ext2fs_realloc/ext2_vfsops.c 2011-03-15 19:53:56.000000000 +0000 @@ -405,7 +405,7 @@ * Things to do to update the mount: * 1) invalidate all cached meta-data. * 2) re-read superblock from disk. - * 3) re-read summary information from disk. + * 3) invalidate all cluster summary information. * 4) invalidate all inactive vnodes. * 5) invalidate all cached file data. * 6) re-read inode data for all active vnodes. @@ -419,7 +419,9 @@ struct buf *bp; struct ext2fs *es; struct m_ext2fs *fs; - int error; + struct csum *sump; + int error, i; + int32_t *lp; if ((mp->mnt_flag & MNT_RDONLY) == 0) return (EINVAL); @@ -456,6 +458,19 @@ #endif brelse(bp); + /* + * Step 3: invalidate all cluster summary information. + */ + if (fs->e2fs_contigsumsize > 0) { + lp = fs->e2fs_maxcluster; + sump = fs->e2fs_clustersum; + for (i = 0; i < fs->e2fs_gcount; i++, sump++) { + *lp++ = fs->e2fs_contigsumsize; + sump->cs_init = 0; + bzero(sump->cs_sum, fs->e2fs_contigsumsize + 1); + } + } + loop: MNT_ILOCK(mp); MNT_VNODE_FOREACH(vp, mp, mvp) { @@ -511,8 +526,11 @@ struct cdev *dev = devvp->v_rdev; struct g_consumer *cp; struct bufobj *bo; + struct csum *sump; int error; int ronly; + int i, size; + int32_t *lp; ronly = vfs_flagopt(mp->mnt_optnew, "ro", NULL, 0); /* XXX: use VOP_ACESS to check FS perms */ @@ -582,6 +600,33 @@ if ((error = compute_sb_data(devvp, ump->um_e2fs->e2fs, ump->um_e2fs))) goto out; + /* + * We calculate the max contiguous blks and size of cluster summary + * array. In ffs, these works are done in newfs. But superblock in + * ext2fs doesn't have these variables. So we just can calculate them + * in here. + */ + ump->um_e2fs->e2fs_maxcontig = MAX(1, MAXPHYS / ump->um_e2fs->e2fs_bsize); + if (ump->um_e2fs->e2fs_maxcontig > 0) + ump->um_e2fs->e2fs_contigsumsize = + MIN(ump->um_e2fs->e2fs_maxcontig, EXT2_MAXCONTIG); + else + ump->um_e2fs->e2fs_contigsumsize = 0; + if (ump->um_e2fs->e2fs_contigsumsize > 0) { + size = ump->um_e2fs->e2fs_gcount * sizeof(int32_t); + ump->um_e2fs->e2fs_maxcluster = malloc(size, M_EXT2MNT, M_WAITOK); + size = ump->um_e2fs->e2fs_gcount * sizeof(struct csum); + ump->um_e2fs->e2fs_clustersum = malloc(size, M_EXT2MNT, M_WAITOK); + lp = ump->um_e2fs->e2fs_maxcluster; + sump = ump->um_e2fs->e2fs_clustersum; + for (i = 0; i < ump->um_e2fs->e2fs_gcount; i++, sump++) { + *lp++ = ump->um_e2fs->e2fs_contigsumsize; + sump->cs_init = 0; + sump->cs_sum = malloc((ump->um_e2fs->e2fs_contigsumsize + 1) * + sizeof(int32_t), M_EXT2MNT, M_WAITOK | M_ZERO); + } + } + brelse(bp); bp = NULL; fs = ump->um_e2fs; @@ -656,7 +701,8 @@ { struct ext2mount *ump; struct m_ext2fs *fs; - int error, flags, ronly; + struct csum *sump; + int error, flags, i, ronly; flags = 0; if (mntflags & MNT_FORCE) { @@ -681,6 +727,11 @@ g_topology_unlock(); PICKUP_GIANT(); vrele(ump->um_devvp); + sump = fs->e2fs_clustersum; + for (i = 0; i < fs->e2fs_gcount; i++, sump++) + free(sump->cs_sum, M_EXT2MNT); + free(fs->e2fs_clustersum, M_EXT2MNT); + free(fs->e2fs_maxcluster, M_EXT2MNT); free(fs->e2fs_gd, M_EXT2MNT); free(fs->e2fs_contigdirs, M_EXT2MNT); free(fs->e2fs, M_EXT2MNT); diff -u ext2fs/ext2fs.h ext2fs_realloc/ext2fs.h --- ext2fs/ext2fs.h 2011-03-15 19:57:35.000000000 +0000 +++ ext2fs_realloc/ext2fs.h 2011-03-15 19:53:56.000000000 +0000 @@ -45,6 +45,17 @@ #define EXT2_LINK_MAX 32000 /* + * A summary of contiguous blocks of various sizes in maintained + * in each cylinder group. Normally this is set by the initial + * value of fs_maxcontig. To conserve space, a maximum summary size + * is set by EXT2_MAXCONTIG. + * + * XXX:FS_MAXCONTIG is set to 16 to conserve space. Here we set it to + * 32 for performance. + */ +#define EXT2_MAXCONTIG 32 + +/* * Constants relative to the data blocks */ #define EXT2_NDIR_BLOCKS 12 @@ -140,6 +151,10 @@ char e2fs_wasvalid; /* valid at mount time */ off_t e2fs_maxfilesize; struct ext2_gd *e2fs_gd; /* Group Descriptors */ + int32_t e2fs_maxcontig; /* max number of contiguous blks */ + int32_t e2fs_contigsumsize; /* size of cluster summary array */ + int32_t *e2fs_maxcluster; /* max cluster in each cyl group */ + struct csum *e2fs_clustersum; /* cluster summary in each cyl group */ }; /* @@ -242,6 +257,13 @@ u_int32_t reserved2[3]; }; +/* cluster summary information */ + +struct csum { + int8_t cs_init; /* cluster summary has been initialized */ + int32_t *cs_sum; /* cluster summary array */ +}; + /* EXT2FS metadatas are stored in little-endian byte order. These macros * helps reading these metadatas */ --------------040509040101080104050601-- From owner-freebsd-fs@FreeBSD.ORG Tue Mar 15 13:05:00 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3FC10106566B for ; Tue, 15 Mar 2011 13:05:00 +0000 (UTC) (envelope-from cforgeron@acsi.ca) Received: from mta03.eastlink.ca (mta03.eastlink.ca [24.224.136.9]) by mx1.freebsd.org (Postfix) with ESMTP id F211B8FC15 for ; Tue, 15 Mar 2011 13:04:59 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from ip05.eastlink.ca ([unknown] [24.222.39.68]) by mta03.eastlink.ca (Sun Java(tm) System Messaging Server 7.3-11.01 64bit (built Sep 1 2009)) with ESMTP id <0LI300EQAOCAWX34@mta03.eastlink.ca>; Tue, 15 Mar 2011 10:04:58 -0300 (ADT) X-CMAE-Score: 0 X-CMAE-Analysis: v=1.1 cv=8reSTVRqS4Rq5Xx4Jai9N41eZpHz3D5gSX5rA0od4mg= c=1 sm=1 a=IkcTkHD0fZMA:10 a=jAMlq-imAAAA:8 a=6I5d2MoRAAAA:8 a=kzlirbLKg49L1WvDEZcA:9 a=L5Jhr7aGExt5yAe39wFeUE7kpP4A:4 a=QEXdDO2ut3YA:10 a=pyy963zsynMA:10 a=fgf5PR_cwQYA:10 a=xraHq2XsALYA:10 a=SV7veod9ZcQA:10 a=/bLbuBD0lrv91xL1PDQKaA==:117 Received: from blk-222-10-85.eastlink.ca (HELO server7.acsi.ca) ([24.222.10.85]) by ip05.eastlink.ca with ESMTP; Tue, 15 Mar 2011 10:04:58 -0300 Received: from server7.acsi.ca ([192.168.9.7]) by server7.acsi.ca ([192.168.9.7]) with mapi; Tue, 15 Mar 2011 10:04:58 -0300 From: Chris Forgeron To: Alexander Leidinger Date: Tue, 15 Mar 2011 10:04:57 -0300 Thread-topic: Constant minor ZFS corruption Thread-index: Acvf+gGLdpbeVSjRTJy3LRVGIfgXiADF18Cg Message-id: References: <201103081425.p28EPQtM002115@dungeon.home> <201103091241.p29CfUM1003302@dungeon.home> <20110311150027.153506yognqhzx18@webmail.leidinger.net> In-reply-to: <20110311150027.153506yognqhzx18@webmail.leidinger.net> Accept-Language: en-US Content-language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Cc: "freebsd-fs@freebsd.org" , Stephen McKay Subject: RE: Constant minor ZFS corruption X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2011 13:05:00 -0000 How big are your pools? I've never had one under 6 TB, which could be my problems.. or that it's been years since I've tried i386 for anything other than a firewall. -----Original Message----- From: Alexander Leidinger [mailto:Alexander@Leidinger.net] Sent: Friday, March 11, 2011 10:00 AM To: Chris Forgeron Cc: Stephen McKay; Mark Felder; freebsd-fs@freebsd.org Subject: RE: Constant minor ZFS corruption Quoting Chris Forgeron (from Thu, 10 Mar 2011 16:43:43 -0400): > Oh - and you're AMD64, correct, not i386? I think we (royal we) should > remove support for i385 in ZFS, it has never been stable for me, and I > see a lot of grief about it on the boards. I also think you need 8 GB > of RAM to play seriously. I've had reasonable success with 4GB and a > light load, but any serious file traffic needs 8GB of breathing room > as ZFS gobbles up the RAM in a very aggressive manner. Veto! I have two x86 machines, one with "only" 768 MB RAM. Both of them run with ZFS without problems. The scenario I use them in may not be the scenario you need to provide a machine for, but there are scenarios where ZFS on x86 works. Bye, Alexander. -- BOFH excuse #113: Root nameservers are out of sync http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Tue Mar 15 13:13:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 013E1106566B for ; Tue, 15 Mar 2011 13:13:43 +0000 (UTC) (envelope-from cforgeron@acsi.ca) Received: from mta01.eastlink.ca (mta01.eastlink.ca [24.224.136.30]) by mx1.freebsd.org (Postfix) with ESMTP id B5A918FC14 for ; Tue, 15 Mar 2011 13:13:42 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from ip08.eastlink.ca ([unknown] [24.222.39.108]) by mta01.eastlink.ca (Sun Java(tm) System Messaging Server 7u3-12.01 64bit (built Oct 15 2009)) with ESMTP id <0LI300FKXOQTBG64@mta01.eastlink.ca>; Tue, 15 Mar 2011 10:13:41 -0300 (ADT) X-CMAE-Score: 0 X-CMAE-Analysis: v=1.1 cv=b0sI0M7bjhCmEOs51LbeKzGQ5ECIs9m+H5QCeOcUmtc= c=1 sm=1 a=kj9zAlcOel0A:10 a=Npn9PEg5AAAA:8 a=6I5d2MoRAAAA:8 a=DPOZ5NiI2eKhm46LzvgA:9 a=V-VHjyien0K8KxRIJl8A:7 a=8yGsVx6g0p2fwXpNWN5WY_-bApgA:4 a=CjuIK1q_8ugA:10 a=SV7veod9ZcQA:10 a=wxKChNxVwVA7eZEm:21 a=-HGEFCrujasmXfj1:21 a=8Oiiw5Yrij1t5t1/bbvJYQ==:117 Received: from blk-222-10-85.eastlink.ca (HELO server7.acsi.ca) ([24.222.10.85]) by ip08.eastlink.ca with ESMTP; Tue, 15 Mar 2011 10:13:41 -0300 Received: from server7.acsi.ca ([192.168.9.7]) by server7.acsi.ca ([192.168.9.7]) with mapi; Tue, 15 Mar 2011 10:13:41 -0300 From: Chris Forgeron To: Stephen McKay Date: Tue, 15 Mar 2011 10:13:40 -0300 Thread-topic: Constant minor ZFS corruption Thread-index: AcvfgDygEus2TL1eSsaDTyZUuXLxJwDkVxVw Message-id: References: <201103081425.p28EPQtM002115@dungeon.home> <201103091241.p29CfUM1003302@dungeon.home> <201103102319.p2ANJWxN002125@dungeon.home> In-reply-to: <201103102319.p2ANJWxN002125@dungeon.home> Accept-Language: en-US Content-language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Cc: "freebsd-fs@freebsd.org" Subject: RE: Constant minor ZFS corruption X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2011 13:13:43 -0000 Hey, - If you run Current in a semi-production env, you don't build daily/weekly, you build once, test a lot, and then don't budge until you need to. There's plenty of bug fixes, but if they are not in what you're using, it usually won't matter. It's why I recommend a very minimal kernel, to reduce exposure to bugs. At the same time, I'm finding 9 to really be shaping up nicely, this isn't like a beta from other sources, it's actually quite useable thanks to all the hard work from committers. Stick with gcc as your compiler for now, and you should be fine. At the state you're in right now, I don't think you could be more unstable. :-) - I found better ZFS performance in 9.0-Current than 8.2-PRE back in Dec 2010, not huge, but enough for me to brave the waters of -CURRENT - 2 Gig is very low, but yes, it won't cause corruption. If anything, low performance and a higher chance of a panic. Try that system with 8 GB, and you'll notice the difference for random I/O after the ARC fills up. - I'm not familiar with the PIKE card - Do you have enough SATA ports on the MB to connect a few drives to, to see if your issues go away? We use the SuperMicro AOC LSI 2008 cards, and they look to be working well for us so far. -----Original Message----- From: smckay@internode.on.net [mailto:smckay@internode.on.net] On Behalf Of Stephen McKay Sent: Thursday, March 10, 2011 7:20 PM To: Chris Forgeron Cc: Stephen McKay; Mark Felder; freebsd-fs@freebsd.org Subject: Re: Constant minor ZFS corruption On Thursday, 10th March 2011, Chris Forgeron wrote: >You know, I've had better luck with v28 and FreeBSD-9-CURRENT. Make a >very minimal compile, test it well, and you should be fine. I just >upgraded my last 8.2 v14 ZFS FreeBSD system earlier this week, so I'm >now 9-Current with v28 across the board. The only issue I've found so >far is a small oddity with displaying files across ZFS, but pjd has >already patched that in r219404. (I'm about to test it now) We are OK using -current if we really have to, but would prefer to stick with an official release (maybe with one or two hand-rolled patches if they are important enough). We've already noticed the -current "upgrade treadmill", having to build a new kernel every day of our testing because important bug fixes are arriving. And in the end, we saw no difference in behaviour, so -current doesn't fix our problems. It's important to test -current, but not in production. :-) >Oh - and you're AMD64, correct, not i386? I think we (royal we) should >remove support for i385 in ZFS, it has never been stable for me, and I >see a lot of grief about it on the boards. I also think you need 8 GB >of RAM to play seriously. I've had reasonable success with 4GB and a >light load, but any serious file traffic needs 8GB of breathing room as >ZFS gobbles up the RAM in a very aggressive manner. Yes, we are running the adm64 kernel. Currently we're low on memory (2GB) because I swapped out the RAM, but that, again, didn't affect our failures. >Lastly, check what Mike Tancsa said about his hardware - All of my gear >is quality, 1000W dual redundant power supplies, LSI SAS controllers, >ECC registered ram, no overclocking, etc, etc. You may have a software >issue, but it's more likely that ZFS is just exposing some instability >in your system. Has your RAM checked out with a Memtest run overnight? >We're talking small, intermittent errors here, not big red flags that >will be obvious to spot. The ASUS PIKE2008 card is LSI based. Our RAM is ECC. We're not overclocking (in fact I disabled turbo-boost). We haven't run memtest but we have done a few "make buildworld" runs. All of these completed without error. And with ECC RAM, we should see log messages if anything is wrong there anyway. We have tried to buy quality hardware. At least, we didn't deliberately skimp (except to build our own box vs buy a big name brand pre-built zfs server). We're starting to get suspicious of the PIKE card though. Is there anyone here who is using an ASUS PIKE2008 (as opposed to other LSI SAS 2008 cards)? We're kinda wishing we'd gotten an older PIKE 1068E instead... Cheers, Stephen. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 15 15:08:26 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 78FA7106566C; Tue, 15 Mar 2011 15:08:26 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 313528FC13; Tue, 15 Mar 2011 15:08:25 +0000 (UTC) Received: from outgoing.leidinger.net (p5B1557F0.dip.t-dialin.net [91.21.87.240]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id A7B39844015; Tue, 15 Mar 2011 15:31:01 +0100 (CET) Received: from webmail.leidinger.net (unknown [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id 8A3522C4E; Tue, 15 Mar 2011 15:30:58 +0100 (CET) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p2FEUf76019609; Tue, 15 Mar 2011 15:30:41 +0100 (CET) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Tue, 15 Mar 2011 15:30:41 +0100 Message-ID: <20110315153041.34452iu72hoo0000@webmail.leidinger.net> Date: Tue, 15 Mar 2011 15:30:41 +0100 From: Alexander Leidinger To: Chris Forgeron References: <201103081425.p28EPQtM002115@dungeon.home> <201103091241.p29CfUM1003302@dungeon.home> <20110311150027.153506yognqhzx18@webmail.leidinger.net> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: A7B39844015.A647E X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=1.274, required 6, autolearn=disabled, RDNS_NONE 1.27) X-EBL-MailScanner-SpamScore: s X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1300804262.25906@m2gnsuCDsUwabODaYS+yfw X-EBL-Spam-Status: No Cc: "freebsd-fs@freebsd.org" , Stephen McKay Subject: RE: Constant minor ZFS corruption X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2011 15:08:26 -0000 Quoting Chris Forgeron (from Tue, 15 Mar 2011 10:04:57 -0300): > How big are your pools? I've never had one under 6 TB, which could > be my problems.. or that it's been years since I've tried i386 for > anything other than a firewall. All of my pools on i386 are less than a TB. Not everyone needs that much data. BTW: We prefer to not see top-post quotaing style on this list. Bye, Alexander. > -----Original Message----- > From: Alexander Leidinger [mailto:Alexander@Leidinger.net] > Sent: Friday, March 11, 2011 10:00 AM > To: Chris Forgeron > Cc: Stephen McKay; Mark Felder; freebsd-fs@freebsd.org > Subject: RE: Constant minor ZFS corruption > > Quoting Chris Forgeron (from Thu, 10 Mar 2011 > 16:43:43 -0400): > >> Oh - and you're AMD64, correct, not i386? I think we (royal we) should >> remove support for i385 in ZFS, it has never been stable for me, and I >> see a lot of grief about it on the boards. I also think you need 8 GB >> of RAM to play seriously. I've had reasonable success with 4GB and a >> light load, but any serious file traffic needs 8GB of breathing room >> as ZFS gobbles up the RAM in a very aggressive manner. > > Veto! I have two x86 machines, one with "only" 768 MB RAM. Both of > them run with ZFS without problems. The scenario I use them in may > not be the scenario you need to provide a machine for, but there are > scenarios where ZFS on x86 works. > > Bye, > Alexander. > > -- > BOFH excuse #113: > > Root nameservers are out of sync > > http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 > http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > -- BOFH excuse #54: Evil dogs hypnotised the night shift http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Tue Mar 15 15:38:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 51FAF106566B for ; Tue, 15 Mar 2011 15:38:56 +0000 (UTC) (envelope-from lopez.on.the.lists@yellowspace.net) Received: from mail.yellowspace.net (mail.yellowspace.net [80.190.192.217]) by mx1.freebsd.org (Postfix) with ESMTP id CBFF78FC08 for ; Tue, 15 Mar 2011 15:38:55 +0000 (UTC) Received: from furia.intranet ([188.174.148.157]) (AUTH: CRAM-MD5 lopez.on.the.lists@yellowspace.net, SSL: TLSv1/SSLv3, 256bits, CAMELLIA256-SHA) by mail.yellowspace.net with esmtp; Tue, 15 Mar 2011 15:56:52 +0100 id 027C18CE.000000004D7F7E34.00000DB8 Message-ID: <4D7F7E33.7050103@yellowspace.net> Date: Tue, 15 Mar 2011 15:56:51 +0100 From: Lorenzo Perone User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: gmirror performance X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2011 15:38:56 -0000 Hi @ list, Hi Pawel, just a question about gmirror performance. I have 2 15k SAS drives, mirrored by gmirror. the mirror was setup like this (like manpage example): gmirror label -v -b split -s 2048 mirr0 da0 da1 on a partition of this drive, I make the following test: # dd if=/dev/zero bs=1m count=2000 of=/mnt/2gigfile.dat 2000+0 records in 2000+0 records out 2097152000 bytes transferred in 11.203763 secs (187182824 bytes/sec) # umount /mnt # mount /dev/mirror/mirr0p4 /mnt # dd if=/mnt/2gigfile.dat of=/dev/null bs=1m 2000+0 records in 2000+0 records out 2097152000 bytes transferred in 12.061197 secs (173875942 bytes/sec) I'd expect read performance to be noticeably higher than write performance. Why is it not the case? Wrong expectation? :/ Further Details: - FreeBSD 8.2-STABLE #0: Tue Mar 15 01:34:07 UTC 2011 - Underlying storage driver is the fresh, just MFC'd mps(4) for the DELL PERC H200A controller (so it could be related to that, as well). - Using bs=8k gets better results (180783854 bytes/sec), but this may be caused by other factors. - The filesystem is UFS with soft-updates (newfs -U). Thanx for listening, and for all the nice GEOMs we have @ FreeBSD land :) Regards, Lorenzo From owner-freebsd-fs@FreeBSD.ORG Tue Mar 15 19:59:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D9D631065675 for ; Tue, 15 Mar 2011 19:59:43 +0000 (UTC) (envelope-from giffunip@tutopia.com) Received: from nm30.bullet.mail.sp2.yahoo.com (nm30.bullet.mail.sp2.yahoo.com [98.139.91.100]) by mx1.freebsd.org (Postfix) with SMTP id B651B8FC12 for ; Tue, 15 Mar 2011 19:59:43 +0000 (UTC) Received: from [98.139.91.70] by nm30.bullet.mail.sp2.yahoo.com with NNFMP; 15 Mar 2011 19:59:43 -0000 Received: from [98.139.91.39] by tm10.bullet.mail.sp2.yahoo.com with NNFMP; 15 Mar 2011 19:59:43 -0000 Received: from [127.0.0.1] by omp1039.mail.sp2.yahoo.com with NNFMP; 15 Mar 2011 19:59:43 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 322079.84647.bm@omp1039.mail.sp2.yahoo.com Received: (qmail 68451 invoked by uid 60001); 15 Mar 2011 19:59:43 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1300219183; bh=QMT0xpvgvafj8upaxCwrKHzEh8ibZcda78X/zPhxaVs=; h=Message-ID:X-YMail-OSG:Received:X-RocketYMMF:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=gdVh/9xoD1iYbBG9k/pkKJPP2iNXFzNKtyTOCYljmYuc+Bri8JuS4mg95gFZa10d8cWZ59H/yHZSdBCQ/yHkkXMeSEvSbjvk6qdprqFgDK6AgxehuNYBZaxCyd1jgoxUi5iZI6BmOBUv9RQYP6TWgiAFOXFPDzy+m5OHpO4DvdI= Message-ID: <995608.55186.qm@web113511.mail.gq1.yahoo.com> X-YMail-OSG: OWnEKcMVM1lK2oD8rhe3hLBtVw5xy4FWJZpjdNtf17NaQzM N4KudVm_ET0u_05RGDS6U0YuFLPtx2TlCQi5ts6bXVZWGkGLY1SqMjHNTwX0 ALQ1x0Wo2NBzXx.05Tk0LSP68G2LLUfoE3xsG9q8rJVeMckbGOwCady3kRyP L.u78qlz013GHzIMncAO.5z2NzFwY4WkRxCgq5D2aCFrOrJ00MNHWs6bkxgX ohT4IM_Nx33s3aKW6vBkxSeQi81MRHvMroc10LTwT7pm0_rUhKP4EaZhUx80 dtS8brZjKjAhovG1qhXQruv8o1CR2HMWoZOB9rFApZlwEzl430k0K5XB949R xsX78VfqbzuYIOW3.InP6KhhSGwOdM9cWC3enDS0JXq2l3USfPZzB.utFX9X uVHT9nxp2nEj2 Received: from [200.118.159.214] by web113511.mail.gq1.yahoo.com via HTTP; Tue, 15 Mar 2011 12:59:42 PDT X-RocketYMMF: giffunip X-Mailer: YahooMailClassic/11.4.20 YahooMailWebService/0.8.109.295617 Date: Tue, 15 Mar 2011 12:59:42 -0700 (PDT) From: "Pedro F. Giffuni" To: freebsd-fs@freebsd.org, gnehzuil In-Reply-To: <4D7F58C7.4060402@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Subject: Re: [ext2fs][patch] reallocblks X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: giffunip@tutopia.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2011 19:59:43 -0000 --- On Tue, 3/15/11, gnehzuil wrote: ... > Hi there, > > I have implemented reallocblks in ext2fs. I added some > structures in m_ext2fs to record cluster summary information > due to group descriptor has not a structure to record these > data in disk. So I implemented it in memory. The > implementation is almost the same to in ffs. > > I have done some simple benchmarks with dbench. This patch > can improve the performance a little. > I should mention here that this approach (used by FFS) has shown to be a little better than the reservation window preallocation in performance and is specially important to control fragmentation and filesystem aging: an area that hasn't been worked on very much on other filesystems (notably ext2/3/4). This is loosely related to, and reduces the need for, some other "features" planned in ext4 like delayed allocation, online defragmentation, preallocation and to some extent tailmerging. For more discussion on this topic from a non-BSD perspective, check out: http://www-stud.rbi.informatik.uni-frankfurt.de/~loizides/reiserfs/ I am really glad to see this feature in our ext2 implementation. so thanks Zheng for doing the hard work, and Google for sponsoring him. cheers, Pedro. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 08:27:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3DDC106564A; Wed, 16 Mar 2011 08:27:11 +0000 (UTC) (envelope-from lists@yamagi.org) Received: from mail.yamagi.overkill.yamagi.org (unknown [IPv6:2a01:4f8:121:2102:1::7]) by mx1.freebsd.org (Postfix) with ESMTP id 337828FC08; Wed, 16 Mar 2011 08:27:11 +0000 (UTC) Received: from saya.home.yamagi.org (unknown [IPv6:2001:5c0:150f:8700:21b:21ff:fe07:b562]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.yamagi.overkill.yamagi.org (Postfix) with ESMTPSA id 15B8D16663D1; Wed, 16 Mar 2011 09:27:09 +0100 (CET) Date: Wed, 16 Mar 2011 09:27:04 +0100 (CET) From: Yamagi Burmeister X-X-Sender: yamagi@saya.home.yamagi.org To: freebsd-fs@freebsd.org Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Cc: mckusick@freebsd.org Subject: Snapshots are never freed on at least 8.1 and 8.2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 08:27:11 -0000 Hello, I'm not sure if this is a bug or the expected behavior but it seems quit strange. On at least FreeBSD 8.1 and 8.2 UFS2 snapshots are never freed while the filesystem is mounted. Therefor you have to remount every 20 snapshots which is quiet a pain when using "dump -L" or similar things via cron. Example: -------- 1. Create a new filesystem and copy some data on it: % mdmfs -s 512M md0 /mnt/ % cp -r /usr/src/sys /mnt/ 2. Create 20 snapshots (in tcsh-syntax): % foreach i ( 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ) % mksnap_ffs /mnt/ /mnt/.snap/${i} % end 3. "snapinfo -a" shows 20 snapshots. % root@saya:pts/3 ~> snapinfo -a /mnt/.snap/1 /mnt/.snap/2 /mnt/.snap/3 /mnt/.snap/4 /mnt/.snap/5 /mnt/.snap/6 /mnt/.snap/7 /mnt/.snap/8 /mnt/.snap/9 /mnt/.snap/10 /mnt/.snap/11 /mnt/.snap/12 /mnt/.snap/13 /mnt/.snap/14 /mnt/.snap/15 /mnt/.snap/16 /mnt/.snap/17 /mnt/.snap/18 /mnt/.snap/19 /mnt/.snap/20 4. Further snapshots cannot be created since there's a limit of 20 snapshots per filesystem: % mksnap_ffs /mnt/ /mnt/.snap/21 mksnap_ffs: Cannot create snapshot /mnt/.snap/21: No space left on device 5. Now delete the snapshots: % rm -Rf /mnt/.snap/* 6. "snapinfo -a" tells us that there are no snapshots in the filesystem: % snapinfo -a % 7. But when we want create a new snapshot it fails: % mksnap_ffs /mnt/ /mnt/.snap/1 mksnap_ffs: Cannot create snapshot /mnt/.snap/1: No space left on device 8. "ffsinfo /dev/md0 | grep snapinum" shows us 20 snapshots! % ffsinfo /dev/md0 | grep snapinum snapinum int32_t[ 0] 0x00000004 snapinum int32_t[ 1] 0x00000005 snapinum int32_t[ 2] 0x00000006 snapinum int32_t[ 3] 0x00000007 snapinum int32_t[ 4] 0x00000008 snapinum int32_t[ 5] 0x00000009 snapinum int32_t[ 6] 0x0000000a snapinum int32_t[ 7] 0x0000000b snapinum int32_t[ 8] 0x0000000c snapinum int32_t[ 9] 0x0000000d snapinum int32_t[10] 0x0000000e snapinum int32_t[11] 0x0000000f snapinum int32_t[12] 0x00000010 snapinum int32_t[13] 0x00000011 snapinum int32_t[14] 0x00000012 snapinum int32_t[15] 0x00000013 snapinum int32_t[16] 0x00000014 snapinum int32_t[17] 0x00000015 snapinum int32_t[18] 0x00000016 snapinum int32_t[19] 0x00000017 snapinum int32_t[ 0] 0x00000000 snapinum int32_t[ 0] 0x00000000 snapinum int32_t[ 0] 0x00000000 snapinum int32_t[ 0] 0x00000000 9. Unmounting and remounting the filesystem let the kernel print some warnings. But afterwards (without fsck) the snapshots are gone and we can create new ones: % umount /mnt/ % mount /dev/md0 /mnt/ % dmesg | tail -n 20 ffs_snapshot_mount: non-snapshot inode 4 ffs_snapshot_mount: non-snapshot inode 5 ffs_snapshot_mount: non-snapshot inode 6 ffs_snapshot_mount: non-snapshot inode 7 fs_snapshot_mount: non-snapshot inode 8 ffs_snapshot_mount: non-snapshot inode 9 ffs_snapshot_mount: non-snapshot inode 10 ffs_snapshot_mount: non-snapshot inode 11 ffs_snapshot_mount: non-snapshot inode 12 ffs_snapshot_mount: non-snapshot inode 13 ffs_snapshot_mount: non-snapshot inode 14 fs_snapshot_mount: non-snapshot inode 15 ffs_snapshot_mount: non-snapshot inode 16 ffs_snapshot_mount: non-snapshot inode 17 ffs_snapshot_mount: non-snapshot inode 18 ffs_snapshot_mount: non-snapshot inode 19 ffs_snapshot_mount: non-snapshot inode 20 ffs_snapshot_mount: non-snapshot inode 21 ffs_snapshot_mount: non-snapshot inode 22 ffs_snapshot_mount: non-snapshot inode 23 (This are the inodes shown by the ffsinfo above) % ffsinfo /dev/md0 | grep snapinum snapinum int32_t[ 0] 0x00000000 snapinum int32_t[ 0] 0x00000000 napinum int32_t[ 0] 0x00000000 snapinum int32_t[ 0] 0x00000000 snapinum int32_t[ 0] 0x00000000 % mksnap_ffs /mnt/ /mnt/.snap/1 % snapinfo -a /mnt/.snap/1 Ciao, Yamagi From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 11:00:26 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A1C46106566B; Wed, 16 Mar 2011 11:00:26 +0000 (UTC) (envelope-from lists@yamagi.org) Received: from mail.yamagi.overkill.yamagi.org (unknown [IPv6:2a01:4f8:121:2102:1::7]) by mx1.freebsd.org (Postfix) with ESMTP id 3C7F48FC19; Wed, 16 Mar 2011 11:00:26 +0000 (UTC) Received: from saya.home.yamagi.org (unknown [IPv6:2001:5c0:150f:8700:21b:21ff:fe07:b562]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.yamagi.overkill.yamagi.org (Postfix) with ESMTPSA id 16C3816663D3; Wed, 16 Mar 2011 12:00:24 +0100 (CET) Date: Wed, 16 Mar 2011 12:00:19 +0100 (CET) From: Yamagi Burmeister X-X-Sender: yamagi@saya.home.yamagi.org To: Yamagi Burmeister In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, mckusick@freebsd.org Subject: Re: Snapshots are never freed on at least 8.1 and 8.2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 11:00:26 -0000 On Wed, 16 Mar 2011, Yamagi Burmeister wrote: > Hello, > I'm not sure if this is a bug or the expected behavior but it seems quit > strange. On at least FreeBSD 8.1 and 8.2 UFS2 snapshots are never freed > while the filesystem is mounted. Therefor you have to remount every 20 > snapshots which is quiet a pain when using "dump -L" or similar things > via cron. Okay, I had a deeper look into this and it's some kind of PEBKAC (problem exists between keyboard and chair). For various reasons there is no "options FFS" in the kernel of this box but ufs.ko is loaded in /boot/loader.conf. In sys/modules/zfs/Makefile the CFLAGS are "CFLAGS+= -DSOFTUPDATES -DUFS_DIRHASH". But in sys/ufs/ufs/ufs_lookup.c line 1241 and line 1293 the call to ffs_snapgone() is hidden behind "FFS". Since FFS isn't defined when ufs.ko is build the call isn't compiled in, the function isn't called and the snapshot isn't correctly removed from the list. So the question is, why there's no -DFFS in the CFLAGS for ufs.ko? At this time snapshots are broken when ufs.ko is used. -- Homepage: www.yamagi.org Jabber: yamagi@yamagi.org GnuPG/GPG: 0xEFBCCBCB From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 11:09:29 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 97EA2106566B for ; Wed, 16 Mar 2011 11:09:29 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 0DDC58FC22 for ; Wed, 16 Mar 2011 11:09:28 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p2GB9OP0072095 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 16 Mar 2011 13:09:24 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p2GB9OHn061713; Wed, 16 Mar 2011 13:09:24 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p2GB9O47061712; Wed, 16 Mar 2011 13:09:24 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 16 Mar 2011 13:09:24 +0200 From: Kostik Belousov To: Yamagi Burmeister Message-ID: <20110316110924.GN78089@deviant.kiev.zoral.com.ua> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="0EnJpgTqgR2B/yCU" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, mckusick@freebsd.org Subject: Re: Snapshots are never freed on at least 8.1 and 8.2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 11:09:29 -0000 --0EnJpgTqgR2B/yCU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 16, 2011 at 09:27:04AM +0100, Yamagi Burmeister wrote: > Hello, > I'm not sure if this is a bug or the expected behavior but it seems quit > strange. On at least FreeBSD 8.1 and 8.2 UFS2 snapshots are never freed > while the filesystem is mounted. Therefor you have to remount every 20 > snapshots which is quiet a pain when using "dump -L" or similar things > via cron. =2E.. Yes, very interesting. It seems that ffs_snapgone() is never called. How did our build system mutated over the time so that FFS is no longer defined, I do not know and do not much want to track. diff --git a/sys/ufs/ufs/ufs_lookup.c b/sys/ufs/ufs/ufs_lookup.c index e997718..d819f69 100644 --- a/sys/ufs/ufs/ufs_lookup.c +++ b/sys/ufs/ufs/ufs_lookup.c @@ -1252,10 +1252,8 @@ out: * drop its snapshot reference so that it will be reclaimed * when last open reference goes away. */ -#if defined(FFS) || defined(IFS) if (ip !=3D 0 && (ip->i_flags & SF_SNAPSHOT) !=3D 0 && ip->i_effnlink =3D= =3D 0) ffs_snapgone(ip); -#endif return (error); } =20 @@ -1317,10 +1315,8 @@ ufs_dirrewrite(dp, oip, newinum, newtype, isrmdir) * drop its snapshot reference so that it will be reclaimed * when last open reference goes away. */ -#if defined(FFS) || defined(IFS) if ((oip->i_flags & SF_SNAPSHOT) !=3D 0 && oip->i_effnlink =3D=3D 0) ffs_snapgone(oip); -#endif return (error); } =20 --0EnJpgTqgR2B/yCU Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk2AmmQACgkQC3+MBN1Mb4jAagCguKHfRKWFcDaFH5yArfiDXChA eswAn1QSGSfBTAQsCyfvlhXTiVS1e3hF =QnNx -----END PGP SIGNATURE----- --0EnJpgTqgR2B/yCU-- From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 11:12:19 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DE69A106566B for ; Wed, 16 Mar 2011 11:12:19 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 403AC8FC16 for ; Wed, 16 Mar 2011 11:12:18 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p2GBCFeW072438 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 16 Mar 2011 13:12:15 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p2GBCFhJ061769; Wed, 16 Mar 2011 13:12:15 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p2GBCFWm061768; Wed, 16 Mar 2011 13:12:15 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 16 Mar 2011 13:12:15 +0200 From: Kostik Belousov To: Yamagi Burmeister Message-ID: <20110316111215.GO78089@deviant.kiev.zoral.com.ua> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="2l/qJWwi7aEgZx1i" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, mckusick@freebsd.org Subject: Re: Snapshots are never freed on at least 8.1 and 8.2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 11:12:19 -0000 --2l/qJWwi7aEgZx1i Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 16, 2011 at 12:00:19PM +0100, Yamagi Burmeister wrote: > On Wed, 16 Mar 2011, Yamagi Burmeister wrote: >=20 > >Hello, > >I'm not sure if this is a bug or the expected behavior but it seems quit > >strange. On at least FreeBSD 8.1 and 8.2 UFS2 snapshots are never freed > >while the filesystem is mounted. Therefor you have to remount every 20 > >snapshots which is quiet a pain when using "dump -L" or similar things > >via cron. >=20 > Okay, I had a deeper look into this and it's some kind of PEBKAC > (problem exists between keyboard and chair). For various reasons there > is no "options FFS" in the kernel of this box but ufs.ko is loaded in > /boot/loader.conf. In sys/modules/zfs/Makefile the CFLAGS are "CFLAGS+=3D > -DSOFTUPDATES -DUFS_DIRHASH". > But in sys/ufs/ufs/ufs_lookup.c line 1241 and line 1293 the call to > ffs_snapgone() is hidden behind "FFS". Since FFS isn't defined when > ufs.ko is build the call isn't compiled in, the function isn't called > and the snapshot isn't correctly removed from the list. > So the question is, why there's no -DFFS in the CFLAGS for ufs.ko? At > this time snapshots are broken when ufs.ko is used. See my other reply, I think that #ifdef line shall be removed at all. We do not care much about ufs/ffs split for long time, and if we do, we should introduce some operation like UFS_SNAPGONE(). --2l/qJWwi7aEgZx1i Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk2Amw8ACgkQC3+MBN1Mb4jzpQCg0GBz3zbztmyrOItlwBUJ+1i3 CSkAoI1spiMZ07jGPhISOefzvirc3cG1 =Zgej -----END PGP SIGNATURE----- --2l/qJWwi7aEgZx1i-- From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 11:52:06 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 730A01065673; Wed, 16 Mar 2011 11:52:06 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 30C198FC1F; Wed, 16 Mar 2011 11:52:06 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p2GBq6ef058466; Wed, 16 Mar 2011 11:52:06 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p2GBq6N4058462; Wed, 16 Mar 2011 11:52:06 GMT (envelope-from linimon) Date: Wed, 16 Mar 2011 11:52:06 GMT Message-Id: <201103161152.p2GBq6N4058462@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/155587: [zfs] [panic] kernel panic with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 11:52:06 -0000 Old Synopsis: kernel panic with zfs New Synopsis: [zfs] [panic] kernel panic with zfs Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Mar 16 11:51:50 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=155587 From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 12:01:12 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D81221065676 for ; Wed, 16 Mar 2011 12:01:12 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 91A7F8FC08 for ; Wed, 16 Mar 2011 12:01:12 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1PzpPW-0004AF-1x for freebsd-fs@freebsd.org; Wed, 16 Mar 2011 13:01:06 +0100 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 16 Mar 2011 13:01:06 +0100 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 16 Mar 2011 13:01:06 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Wed, 16 Mar 2011 13:00:52 +0100 Lines: 33 Message-ID: References: <4D7F7E33.7050103@yellowspace.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101102 Thunderbird/3.1.6 In-Reply-To: <4D7F7E33.7050103@yellowspace.net> X-Enigmail-Version: 1.1.2 Subject: Re: gmirror performance X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 12:01:12 -0000 On 15/03/2011 15:56, Lorenzo Perone wrote: > > Hi @ list, Hi Pawel, > > just a question about gmirror performance. I have 2 15k SAS drives, > mirrored by gmirror. the mirror was setup like this (like manpage example): > > gmirror label -v -b split -s 2048 mirr0 da0 da1 > > on a partition of this drive, I make the following test: > > # dd if=/dev/zero bs=1m count=2000 of=/mnt/2gigfile.dat > 2000+0 records in > 2000+0 records out > 2097152000 bytes transferred in 11.203763 secs (187182824 bytes/sec) > > # umount /mnt > # mount /dev/mirror/mirr0p4 /mnt > > # dd if=/mnt/2gigfile.dat of=/dev/null bs=1m > 2000+0 records in > 2000+0 records out > 2097152000 bytes transferred in 12.061197 secs (173875942 bytes/sec) > > I'd expect read performance to be noticeably higher than write > performance. Why is it not the case? Wrong expectation? :/ Maybe. You can't expect that RAID-1 will have as good performance as RAID-0 but you might achieve better performance for sequential reads with long buffers. Try setting the vfs.read_max sysctl to 128 and see if it helps you. (you might want to leave the gmirror algorithm to the default "load" and increase the stripe size to something sane, like 16k). From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 13:48:39 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DF379106566B for ; Wed, 16 Mar 2011 13:48:39 +0000 (UTC) (envelope-from lopez.on.the.lists@yellowspace.net) Received: from mail.yellowspace.net (mail.yellowspace.net [80.190.192.217]) by mx1.freebsd.org (Postfix) with ESMTP id 5CA628FC1A for ; Wed, 16 Mar 2011 13:48:38 +0000 (UTC) Received: from furia.intranet ([93.104.191.142]) (AUTH: CRAM-MD5 lopez.on.the.lists@yellowspace.net, SSL: TLSv1/SSLv3, 256bits, CAMELLIA256-SHA) by mail.yellowspace.net with esmtp; Wed, 16 Mar 2011 14:48:36 +0100 id 027C18CC.000000004D80BFB4.0000A98E Message-ID: <4D80BFB3.20706@yellowspace.net> Date: Wed, 16 Mar 2011 14:48:35 +0100 From: Lorenzo Perone User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 MIME-Version: 1.0 To: Ivan Voras References: <4D7F7E33.7050103@yellowspace.net> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: gmirror performance X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 13:48:40 -0000 On 16.03.11 13:00, Ivan Voras wrote: > On 15/03/2011 15:56, Lorenzo Perone wrote: ... >> I'd expect read performance to be noticeably higher than write >> performance. Why is it not the case? Wrong expectation? :/ > Maybe. You can't expect that RAID-1 will have as good performance as > RAID-0 but you might achieve better performance for sequential reads > with long buffers. Try setting the vfs.read_max sysctl to 128 and see if > it helps you. It *does* help! Thanx a great lot! I knew I it was a PEBKAC :) sysctl vfs.read_max=128 configure -b load mirr0 just gave me 70MB/s more when reading (256640376 bytes/sec) :) > (you might want to leave the gmirror algorithm to the >> default "load" and increase the stripe size to something sane, like 16k). If You meant gmirror configure -s 16384 mirr0: this didn't change anything for -b load, as expected, but it did change a little for -b split. To sum up some results, fwimc: test case: umount /mnt && mount /dev/mirror/mirr0p4 /mnt && \ dd if=/mnt/2gigfile.dat bs=1m of=/dev/null * with default vfs.read_max=8 -b split -s 2048: 173875942 bytes/sec -b load: 195143412 bytes/sec * with vfs.read_max=128 -b split -s 2048: 191024137 bytes/sec -b load: 258329216 bytes/sec Big Thanx and Regards, Lorenzo From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 13:50:07 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63341106564A for ; Wed, 16 Mar 2011 13:50:07 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 2E5008FC17 for ; Wed, 16 Mar 2011 13:50:07 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p2GDo7Po058608 for ; Wed, 16 Mar 2011 13:50:07 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p2GDo7Yo058607; Wed, 16 Mar 2011 13:50:07 GMT (envelope-from gnats) Date: Wed, 16 Mar 2011 13:50:07 GMT Message-Id: <201103161350.p2GDo7Yo058607@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Andriy Gapon Cc: Subject: Re: kern/155484: [ufs] GPT + UFS boot don't work well together X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Andriy Gapon List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 13:50:07 -0000 The following reply was made to PR kern/155484; it has been noted by GNATS. From: Andriy Gapon To: bug-followup@freebsd.org, rarehawk@gmail.com Cc: Subject: Re: kern/155484: [ufs] GPT + UFS boot don't work well together Date: Wed, 16 Mar 2011 15:42:42 +0200 Let me ask again - did my suggestion help? -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 15:03:29 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8B7E106566C; Wed, 16 Mar 2011 15:03:29 +0000 (UTC) (envelope-from lists@yamagi.org) Received: from mail.yamagi.overkill.yamagi.org (unknown [IPv6:2a01:4f8:121:2102:1::7]) by mx1.freebsd.org (Postfix) with ESMTP id 7E8CB8FC12; Wed, 16 Mar 2011 15:03:29 +0000 (UTC) Received: from saya.home.yamagi.org (unknown [IPv6:2001:5c0:150f:8700:21b:21ff:fe07:b562]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.yamagi.overkill.yamagi.org (Postfix) with ESMTPSA id 1CC7D16663D1; Wed, 16 Mar 2011 16:03:27 +0100 (CET) Date: Wed, 16 Mar 2011 16:03:22 +0100 (CET) From: Yamagi Burmeister X-X-Sender: yamagi@saya.home.yamagi.org To: Kostik Belousov In-Reply-To: <20110316110924.GN78089@deviant.kiev.zoral.com.ua> Message-ID: References: <20110316110924.GN78089@deviant.kiev.zoral.com.ua> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, mckusick@freebsd.org Subject: Re: Snapshots are never freed on at least 8.1 and 8.2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 15:03:30 -0000 On Wed, 16 Mar 2011, Kostik Belousov wrote: > On Wed, Mar 16, 2011 at 09:27:04AM +0100, Yamagi Burmeister wrote: >> Hello, >> I'm not sure if this is a bug or the expected behavior but it seems quit >> strange. On at least FreeBSD 8.1 and 8.2 UFS2 snapshots are never freed >> while the filesystem is mounted. Therefor you have to remount every 20 >> snapshots which is quiet a pain when using "dump -L" or similar things >> via cron. > ... > > Yes, very interesting. It seems that ffs_snapgone() is never called. > How did our build system mutated over the time so that FFS is no longer > defined, I do not know and do not much want to track. The patch is working as espected. I've done a quick test with GENERIC and one with a seperatly build module. Thanks. :) -- Homepage: www.yamagi.org Jabber: yamagi@yamagi.org GnuPG/GPG: 0xEFBCCBCB From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 20:10:15 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 57551106564A for ; Wed, 16 Mar 2011 20:10:15 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 4AB388FC08 for ; Wed, 16 Mar 2011 20:10:15 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p2GKAF7h007765 for ; Wed, 16 Mar 2011 20:10:15 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p2GKAFaa007764; Wed, 16 Mar 2011 20:10:15 GMT (envelope-from gnats) Date: Wed, 16 Mar 2011 20:10:15 GMT Message-Id: <201103162010.p2GKAFaa007764@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/153552: commit references a PR X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 20:10:15 -0000 The following reply was made to PR kern/153552; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/153552: commit references a PR Date: Wed, 16 Mar 2011 20:05:19 +0000 (UTC) Author: ae Date: Wed Mar 16 20:04:56 2011 New Revision: 219702 URL: http://svn.freebsd.org/changeset/base/219702 Log: Set control flags in putc(). This should fix zfsboot hangs in drvread(). PR: kern/153552 Reviewed by: jhb MFC after: 1 week Modified: head/sys/boot/i386/common/cons.c Modified: head/sys/boot/i386/common/cons.c ============================================================================== --- head/sys/boot/i386/common/cons.c Wed Mar 16 17:09:51 2011 (r219701) +++ head/sys/boot/i386/common/cons.c Wed Mar 16 20:04:56 2011 (r219702) @@ -37,6 +37,7 @@ void putc(int c) { + v86.ctl = V86_FLAGS; v86.addr = 0x10; v86.eax = 0xe00 | (c & 0xff); v86.ebx = 0x7; _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 21:06:51 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C57991065673; Wed, 16 Mar 2011 21:06:51 +0000 (UTC) (envelope-from eadler@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9E8218FC12; Wed, 16 Mar 2011 21:06:51 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p2GL6pJA062608; Wed, 16 Mar 2011 21:06:51 GMT (envelope-from eadler@freefall.freebsd.org) Received: (from eadler@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p2GL6pwm062602; Wed, 16 Mar 2011 16:06:51 -0500 (EST) (envelope-from eadler) Date: Wed, 16 Mar 2011 16:06:51 -0500 (EST) Message-Id: <201103162106.p2GL6pwm062602@freefall.freebsd.org> To: hlh@restart.be, eadler@FreeBSD.org, freebsd-fs@FreeBSD.org, ae@FreeBSD.org From: eadler@FreeBSD.org Cc: Subject: Re: kern/153552: [zfs] zfsboot from 8.2-RC1 freeze at boot time X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 21:06:51 -0000 Synopsis: [zfs] zfsboot from 8.2-RC1 freeze at boot time State-Changed-From-To: open->patched State-Changed-By: eadler State-Changed-When: Wed Mar 16 16:06:45 EST 2011 State-Changed-Why: committed in r219702 Responsible-Changed-From-To: freebsd-fs->ae Responsible-Changed-By: eadler Responsible-Changed-When: Wed Mar 16 16:06:45 EST 2011 Responsible-Changed-Why: committed in r219702 http://www.freebsd.org/cgi/query-pr.cgi?pr=153552 From owner-freebsd-fs@FreeBSD.ORG Wed Mar 16 23:00:26 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AEBC2106566B for ; Wed, 16 Mar 2011 23:00:26 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A03978FC0A for ; Wed, 16 Mar 2011 23:00:26 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p2GN0QvJ061612 for ; Wed, 16 Mar 2011 23:00:26 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p2GN0QRU061611; Wed, 16 Mar 2011 23:00:26 GMT (envelope-from gnats) Date: Wed, 16 Mar 2011 23:00:26 GMT Message-Id: <201103162300.p2GN0QRU061611@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Andrey Vladimirov Cc: Subject: Re: kern/155484: [ufs] GPT + UFS boot don't work well together X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Andrey Vladimirov List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2011 23:00:26 -0000 The following reply was made to PR kern/155484; it has been noted by GNATS. From: Andrey Vladimirov To: Andriy Gapon Cc: freebsd-gnats-submit@freebsd.org Subject: Re: kern/155484: [ufs] GPT + UFS boot don't work well together Date: Thu, 17 Mar 2011 00:52:58 +0200 --001636d34866f7fac5049ea1677b Content-Type: text/plain; charset=ISO-8859-1 > > > Let me ask again - did my suggestion help? > > -- > Andriy Gapon > No. (if swap partition after freebsd-boot) -- Best regards, Andrey Vladimirov --001636d34866f7fac5049ea1677b Content-Type: text/html; charset=ISO-8859-1

Let me ask again - did my suggestion help?

--
Andriy Gapon

No. (if swap partition after freebsd-boot)

--
Best regards,
Andrey Vladimirov
--001636d34866f7fac5049ea1677b-- From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 01:45:50 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38DC9106564A; Thu, 17 Mar 2011 01:45:50 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0E77F8FC0A; Thu, 17 Mar 2011 01:45:50 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p2H1jnn1015024; Thu, 17 Mar 2011 01:45:49 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p2H1jnua015020; Thu, 17 Mar 2011 01:45:49 GMT (envelope-from linimon) Date: Thu, 17 Mar 2011 01:45:49 GMT Message-Id: <201103170145.p2H1jnua015020@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/155615: [zfs] zfs v28 broken on sparc64 -current X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 01:45:50 -0000 Old Synopsis: zfs v28 broken on sparc64 -current New Synopsis: [zfs] zfs v28 broken on sparc64 -current Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Thu Mar 17 01:45:07 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=155615 From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 04:30:34 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BF7581065670 for ; Thu, 17 Mar 2011 04:30:34 +0000 (UTC) (envelope-from luke@digital-crocus.com) Received: from mail.digital-crocus.com (node2.digital-crocus.com [91.209.244.128]) by mx1.freebsd.org (Postfix) with ESMTP id 7B96C8FC0C for ; Thu, 17 Mar 2011 04:30:34 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector; d=hybrid-logic.co.uk; h=Received:Received:Subject:From:Reply-To:To:Content-Type:Organization:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse; b=jFlq4QWyPkf8xR7GnKinB/gGQ7VWcBmrSZzkNSG1skvZt+RksAfzlCXW1Bn4crjtNEpwYcx4bU19WkQgKayfRb8vvJ0bH9z9J869dkzr+YwQiUM4GNuih1nd6wTzJV/r; Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1Q04Uf-0006FC-2o for freebsd-fs@freebsd.org; Thu, 17 Mar 2011 04:07:25 +0000 Received: from c-76-118-178-109.hsd1.ma.comcast.net ([76.118.178.109] helo=[192.168.1.15]) by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD)) (envelope-from ) id 1Q04Ue-0006Et-KW; Thu, 17 Mar 2011 04:07:25 +0000 From: Luke Marsden To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org, freebsd-current@freebsd.org Content-Type: text/plain; charset="UTF-8" Organization: Hybrid Web Cluster Date: Thu, 17 Mar 2011 00:08:01 -0400 Message-ID: <1300334881.3837.126.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Digital-Crocus-Maillimit: done X-Authenticated-Sender: luke X-Complaints: abuse@digital-crocus.com X-Admin: admin@digital-crocus.com X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse reports) Cc: Subject: Guaranteed kernel panic with ZFS + nullfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: luke@hybrid-logic.co.uk List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 04:30:34 -0000 Hi all, The following script seems to cause a guaranteed kernel panic on 8.1-R, 8.2-R and 8-STABLE as of today (2011-03-16), with both ZFS v14/15, and v28 on 8.2-R with mm@ patches from 2011-03. I suspect it may also affect 9-CURRENT but have not tested this yet. #!/usr/local/bin/bash export POOL=hpool # change this to your pool name sudo zfs destroy -r $POOL/foo sudo zfs create $POOL/foo sudo zfs set mountpoint=/foo $POOL/foo sudo mount -t nullfs /foo /bar sudo touch /foo/baz ls /bar # should see baz sudo zfs umount -f $POOL/foo # seems okay (ls: /bar: Bad file descriptor) sudo zfs mount $POOL/foo # PANIC! Can anyone suggest a patch which fixes this? Preferably against 8-STABLE :-) I also have a more subtle problem where, after mounting and then quickly force-unmounting a ZFS filesystem (call it A) with two nullfs-mounted filesystems and a devfs filesystem within it, running "ls" on the mountpoint of the parent filesystem of A hangs. I'm working on narrowing it down to a shell script like the above - as soon as I have one I'll post a followup. This latter problem is actually more of an issue for me - I can avoid the behaviour which triggers the panic ("if it hurts, don't do it"), but I need to be able to perform the actions which trigger the deadlock (mounting and unmounting filesystems). This also affects 8.1-R, 8.2-R, 8-STABLE and 8.2-R+v28. It seems to be the "zfs umount -f" process which hangs and triggers further accesses to the parent filesystem to hang. Note that I have definitely correctly unmounted the nullfs and devfs mounts from within the filesystem before I force the unmount. Unfortunately the -f is necessary in my application. After the hang: hybrid@dev3:/opt/HybridCluster$ sudo ps ax |grep zfs 41 ?? DL 0:00.11 [zfskern] 3751 ?? D 0:00.03 /sbin/zfs unmount -f hpool/hcfs/filesystem1 hybrid@dev3:/opt/HybridCluster$ sudo procstat -kk 3751 PID TID COMM TDNAME KSTACK 3751 100264 zfs - mi_switch+0x16f sleepq_wait+0x42 _sleep+0x31c zfsvfs_teardown+0x269 zfs_umount+0x1a7 dounmount+0x28a unmount+0x3c8 syscall+0x1e7 Xfast_syscall+0xe1 hybrid@dev3:/opt/HybridCluster$ sudo procstat -kk 41 PID TID COMM TDNAME KSTACK 41 100058 zfskern arc_reclaim_thre mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1 fork_exit+0x118 fork_trampoline+0xe 41 100062 zfskern l2arc_feed_threa mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be fork_exit+0x118 fork_trampoline+0xe 41 100090 zfskern txg_thread_enter mi_switch+0x16f sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread +0xb5 fork_exit+0x118 fork_trampoline+0xe 41 100091 zfskern txg_thread_enter mi_switch+0x16f sleepq_timedwait+0x42 _cv_timedwait+0x129 txg_thread_wait+0x3c txg_sync_thread+0x355 fork_exit+0x118 fork_trampoline+0xe I will continue to attempt to create a shell script which makes this latter bug easily reproducible. In the meantime, what further information can I gather? I will build a debug kernel in the morning. If it helps accelerate finding a solution to this problem, Hybrid Logic Ltd might be able to fund a small bounty for a fix. Contact me off-list if you can help in this way. -- Best Regards, Luke Marsden CTO, Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting Phone: +441172232002 / +16179496062 From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 07:16:19 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 035EF106566B for ; Thu, 17 Mar 2011 07:16:19 +0000 (UTC) (envelope-from marcus@blazingdot.com) Received: from marklar.blazingdot.com (marklar.blazingdot.com [207.154.84.83]) by mx1.freebsd.org (Postfix) with SMTP id CB20D8FC1B for ; Thu, 17 Mar 2011 07:16:18 +0000 (UTC) Received: (qmail 53589 invoked by uid 503); 17 Mar 2011 07:16:18 -0000 Date: Wed, 16 Mar 2011 23:16:18 -0800 From: Marcus Reid To: freebsd-fs@freebsd.org Message-ID: <20110317071618.GB49199@blazingdot.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Coffee-Level: nearly-fatal User-Agent: Mutt/1.5.6i Subject: ZFS vfs.zfs.cache_flush_disable and ZIL reliability X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 07:16:19 -0000 Hi, I was just doing some reading about write barriers being used in filesystems to ensure that the journal is complete to prevent data corruption on unexpected failure. This is done by flushing the disk cache after making a journal entry and before writing to the rest of the fs. I figured I'd look to see what different filesystems do for this. In Linux, ext3 and ext4 have a "barrier" mount option which controls this. It's the subject of much debate and was turned off by default until 2.6.28 in ext4 (it's still off by default in ext3) because it can significantly reduce performance in some workloads. FreeBSD g_journal is not configurable -- it flushes the cache and looks to be safe. My only worry is that it looks like it might even flush it too often, but there may be a reason for the extra flush. I'm having a hard time finding where the rubber meets the road with the ZFS ZIL though (one does not just walk into Mordor.) I got as far as finding the vfs.zfs.cache_flush_disable sysctl which sets zfs_nocacheflush which is referenced in zil_add_block() in zil.c but haven't found where the actual flushing happens. Can someone who is more familiar with it comment on whether this is happening? Thanks, Marcus From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 07:24:01 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 60A931065677 for ; Thu, 17 Mar 2011 07:24:01 +0000 (UTC) (envelope-from marcus@blazingdot.com) Received: from marklar.blazingdot.com (marklar.blazingdot.com [207.154.84.83]) by mx1.freebsd.org (Postfix) with SMTP id 315C48FC1B for ; Thu, 17 Mar 2011 07:24:01 +0000 (UTC) Received: (qmail 52327 invoked by uid 503); 17 Mar 2011 06:57:20 -0000 Date: Wed, 16 Mar 2011 22:57:20 -0800 From: Marcus Reid To: Lorenzo Perone Message-ID: <20110317065720.GA49199@blazingdot.com> References: <4D7F7E33.7050103@yellowspace.net> <4D80BFB3.20706@yellowspace.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4D80BFB3.20706@yellowspace.net> X-Coffee-Level: nearly-fatal User-Agent: Mutt/1.5.6i Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: gmirror performance X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 07:24:01 -0000 On Wed, Mar 16, 2011 at 02:48:35PM +0100, Lorenzo Perone wrote: > On 16.03.11 13:00, Ivan Voras wrote: > > >On 15/03/2011 15:56, Lorenzo Perone wrote: > ... > >>I'd expect read performance to be noticeably higher than write > >>performance. Why is it not the case? Wrong expectation? :/ > > >Maybe. You can't expect that RAID-1 will have as good performance as > >RAID-0 but you might achieve better performance for sequential reads > >with long buffers. Try setting the vfs.read_max sysctl to 128 and see if > >it helps you. > > It *does* help! > > Thanx a great lot! I knew I it was a PEBKAC :) > > sysctl vfs.read_max=128 > configure -b load mirr0 > > just gave me 70MB/s more when reading (256640376 bytes/sec) :) > > >(you might want to leave the gmirror algorithm to the > >>default "load" and increase the stripe size to something sane, like 16k). > > If You meant gmirror configure -s 16384 mirr0: this didn't change > anything for -b load, as expected, but it did change a little for -b split. > > To sum up some results, fwimc: > > test case: > > umount /mnt && mount /dev/mirror/mirr0p4 /mnt && \ > dd if=/mnt/2gigfile.dat bs=1m of=/dev/null > > * with default vfs.read_max=8 > > -b split -s 2048: > 173875942 bytes/sec > > -b load: > 195143412 bytes/sec > > * with vfs.read_max=128 > > -b split -s 2048: > 191024137 bytes/sec > > -b load: > 258329216 bytes/sec Wow, that's great. I just almost doubled big sequential read performance on one of my machines with this too. The question now is why the defaults are the way they are... Does a big vfs.read_max (described as "Cluster read-ahead max block count") pessimize performance in some other way? Marcus From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 07:46:01 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5B545106568D for ; Thu, 17 Mar 2011 07:46:01 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA11.westchester.pa.mail.comcast.net (qmta11.westchester.pa.mail.comcast.net [76.96.59.211]) by mx1.freebsd.org (Postfix) with ESMTP id 0B39B8FC2B for ; Thu, 17 Mar 2011 07:46:00 +0000 (UTC) Received: from omta15.westchester.pa.mail.comcast.net ([76.96.62.87]) by QMTA11.westchester.pa.mail.comcast.net with comcast id L7kg1g0061swQuc5B7m1KU; Thu, 17 Mar 2011 07:46:01 +0000 Received: from koitsu.dyndns.org ([76.102.12.206]) by omta15.westchester.pa.mail.comcast.net with comcast id L7lz1g00z4SkFJc3b7m01A; Thu, 17 Mar 2011 07:46:01 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id AF79C9B429; Thu, 17 Mar 2011 00:45:58 -0700 (PDT) Date: Thu, 17 Mar 2011 00:45:58 -0700 From: Jeremy Chadwick To: Marcus Reid Message-ID: <20110317074558.GA2248@icarus.home.lan> References: <20110317071618.GB49199@blazingdot.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110317071618.GB49199@blazingdot.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS vfs.zfs.cache_flush_disable and ZIL reliability X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 07:46:01 -0000 On Wed, Mar 16, 2011 at 11:16:18PM -0800, Marcus Reid wrote: > I was just doing some reading about write barriers being used in > filesystems to ensure that the journal is complete to prevent data > corruption on unexpected failure. This is done by flushing the > disk cache after making a journal entry and before writing to > the rest of the fs. > > I figured I'd look to see what different filesystems do for this. > > In Linux, ext3 and ext4 have a "barrier" mount option which controls > this. It's the subject of much debate and was turned off by default > until 2.6.28 in ext4 (it's still off by default in ext3) because it > can significantly reduce performance in some workloads. > > FreeBSD g_journal is not configurable -- it flushes the cache and > looks to be safe. My only worry is that it looks like it might even > flush it too often, but there may be a reason for the extra flush. > > I'm having a hard time finding where the rubber meets the road with > the ZFS ZIL though (one does not just walk into Mordor.) I got as > far as finding the vfs.zfs.cache_flush_disable sysctl which sets > zfs_nocacheflush which is referenced in zil_add_block() in zil.c > but haven't found where the actual flushing happens. Can someone > who is more familiar with it comment on whether this is happening? I think what you might be looking for is BIO_FLUSH, which is a kernel thing. I could have the name wrong; someone will need to correct me. Whenever this topic comes up, I always ask people the same 2 questions: 1) What *absolute guarantee* do you have that data *actually gets written to the platters* when BIO_FLUSH is called? You can sync/sync/sync all you want -- there's no guarantee that the hard disk itself (that is to say, the cache that lives on the hard disk) has fully written all of its data to its platters. 2) What do you think will happen when the hard disk abruptly loses power? Could be the system PSU dying, could be the power circuitry on the drive failing, could be a "quirk" that causes the drive to power-cycle itself, etc... General question to users and/or developers: Can someone please explain to me why people are so horribly focused (I would go as far to say OCD) on this topic? Won't there *always* be some degree of potential loss of data in the above two circumstances? Shouldn't the concern be less about "how much data just got lost" and more about "is the filesystem actually usable and clean/correct?" (ZFS implements the latter two assuming you're using mirror or raidz). Sorry for the rant, I just keep seeing this topic come up over and over and over and over and over and it blows my mind. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 08:04:45 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7EC4A1065672 for ; Thu, 17 Mar 2011 08:04:45 +0000 (UTC) (envelope-from marcus@blazingdot.com) Received: from marklar.blazingdot.com (marklar.blazingdot.com [207.154.84.83]) by mx1.freebsd.org (Postfix) with SMTP id 4EC938FC0A for ; Thu, 17 Mar 2011 08:04:45 +0000 (UTC) Received: (qmail 75587 invoked by uid 503); 17 Mar 2011 08:04:35 -0000 Date: Thu, 17 Mar 2011 00:04:35 -0800 From: Marcus Reid To: Jeremy Chadwick Message-ID: <20110317080435.GC49199@blazingdot.com> References: <20110317071618.GB49199@blazingdot.com> <20110317074558.GA2248@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110317074558.GA2248@icarus.home.lan> X-Coffee-Level: nearly-fatal User-Agent: Mutt/1.5.6i Cc: freebsd-fs@freebsd.org Subject: Re: ZFS vfs.zfs.cache_flush_disable and ZIL reliability X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 08:04:45 -0000 On Thu, Mar 17, 2011 at 12:45:58AM -0700, Jeremy Chadwick wrote: > General question to users and/or developers: > > Can someone please explain to me why people are so horribly focused (I > would go as far to say OCD) on this topic? > > Won't there *always* be some degree of potential loss of data in the > above two circumstances? Shouldn't the concern be less about "how much > data just got lost" and more about "is the filesystem actually usable > and clean/correct?" (ZFS implements the latter two assuming you're > using mirror or raidz). I'm going to venture that it's so important because it's the one big most likely thing to go wrong if you're not using the right hardware or aren't configured right. Don't want to be that guy with the bad pool reaching for the backup tapes. Was also just curious to lay my eyes on the line that flushes the cache and was having a hard time finding it :/ > Sorry for the rant, I just keep seeing this topic come up over and over > and over and over and over and it blows my mind. I did search around a bit and couldn't find an old thread. The topic of disk caches and all the stuff around it has come up a lot over the years, but not quite in this context. Marcus From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 08:37:45 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC4C3106564A for ; Thu, 17 Mar 2011 08:37:45 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 4DB138FC1A for ; Thu, 17 Mar 2011 08:37:45 +0000 (UTC) Received: from outgoing.leidinger.net (p5B15588E.dip.t-dialin.net [91.21.88.142]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id E73A184400E; Thu, 17 Mar 2011 09:37:39 +0100 (CET) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id AA82B3C68; Thu, 17 Mar 2011 09:37:36 +0100 (CET) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p2H8bFAd027852; Thu, 17 Mar 2011 09:37:15 +0100 (CET) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 17 Mar 2011 09:37:15 +0100 Message-ID: <20110317093715.300351qg801prjgo@webmail.leidinger.net> Date: Thu, 17 Mar 2011 09:37:15 +0100 From: Alexander Leidinger To: Jeremy Chadwick References: <20110317071618.GB49199@blazingdot.com> <20110317074558.GA2248@icarus.home.lan> In-Reply-To: <20110317074558.GA2248@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: E73A184400E.A3D35 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=0, required 6, autolearn=disabled) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1300955860.8858@uTM8Qu1IesIQ/sF6pGKOBw X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org Subject: Re: ZFS vfs.zfs.cache_flush_disable and ZIL reliability X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 08:37:45 -0000 Quoting Jeremy Chadwick (from Thu, 17 Mar 2011 00:45:58 -0700): > Whenever this topic comes up, I always ask people the same 2 questions: > > > 1) What *absolute guarantee* do you have that data *actually gets > written to the platters* when BIO_FLUSH is called? You can > sync/sync/sync all you want -- there's no guarantee that the hard disk > itself (that is to say, the cache that lives on the hard disk) has fully > written all of its data to its platters. Obvious answer: None, if the disk lies to you. > 2) What do you think will happen when the hard disk abruptly loses > power? Could be the system PSU dying, could be the power circuitry on > the drive failing, could be a "quirk" that causes the drive to > power-cycle itself, etc... Obvious answer: You lose the data until the last sync (if the FS is DTRT like UFS+softupdates/journal or ZFS). > General question to users and/or developers: > > Can someone please explain to me why people are so horribly focused (I > would go as far to say OCD) on this topic? > > Won't there *always* be some degree of potential loss of data in the > above two circumstances? Shouldn't the concern be less about "how much > data just got lost" and more about "is the filesystem actually usable > and clean/correct?" (ZFS implements the latter two assuming you're > using mirror or raidz). You want to always have a consistent FS, that's sure. Parts of consistency guarantees depend upon having data on disk for sure before other changes. You do not want to have the data (FS meta-data) before a flush point reordered in the cache after data (FS meta-data) which was written after the flush point. You also want to lose as less data as possible: Think about your bank account while doing transactions. If the disk lies, it could be (attention, huge simplification here) that your transaction to someone was made but the bank "forgets" to remove the money from your account. This is surely something nobody of us would mind, but the bank does. The other way around, someone transfers money to you, it is removed from his account, but not added to your one, is a more unpleasant one you surely would object about. I'm sure you know about the "only acknowledge to the remote side if the data is really stored" way of handling transfers (mail, DB, ...). If the disk lies to you, you can not do anything (maybe you got what you payed for), but if you have disks which actually DTRT, you do not lose mail (sender retries) or money (the transaction processing can restart from the last ACKed point). Bye, Alexander. -- Even if you do learn to speak correct English, whom are you going to speak it to? -- Clarence Darrow http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 09:54:05 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 87D96106566C; Thu, 17 Mar 2011 09:54:05 +0000 (UTC) (envelope-from hlh@restart.be) Received: from tignes.restart.be (tignes.restart.be [IPv6:2001:41d0:2:56bf:0:1::]) by mx1.freebsd.org (Postfix) with ESMTP id 0F9758FC08; Thu, 17 Mar 2011 09:54:05 +0000 (UTC) Received: from restart.be (avoriaz.tunnel.bel [IPv6:2001:41d0:2:56bf:1:ffff::]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "smtp.restart.be", Issuer "CA master" (verified OK)) by tignes.restart.be (Postfix) with ESMTPS id 0F6431401D; Thu, 17 Mar 2011 10:54:04 +0100 (CET) Received: from morzine.restart.bel (morzine.restart.be [IPv6:2001:41d0:2:56bf:1:2::]) (authenticated bits=0) by restart.be (8.14.4/8.14.4) with ESMTP id p2H9s3x7060811; Thu, 17 Mar 2011 10:54:03 +0100 (CET) (envelope-from hlh@restart.be) X-DKIM: Sendmail DKIM Filter v2.8.3 restart.be p2H9s3x7060811 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=restart.be; s=avoriaz; t=1300355643; bh=SPaqZi3RkYcUQjjrGBJKSfm2e+JvB8BJeiAJ8jWeTO8=; h=Message-ID:Date:From:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=WWJ8ML3Yz4J1wgqUsnrr4Lkb4GkBN5BloKqq29G6haVvDLutknHh/v4xnXVEuJyap ENlGzEzrm8hsURTN9kAjA== X-DomainKeys: Sendmail DomainKeys Filter v1.0.2 restart.be p2H9s3x7060811 DomainKey-Signature: a=rsa-sha1; s=avoriaz; d=restart.be; c=nofws; q=dns; h=message-id:date:from:organization:user-agent:mime-version:to:cc: subject:references:in-reply-to:content-type:content-transfer-encoding; b=ZU60tWSRW+4LHGH9/2xE4+J9DyUoduLpiKaOtw/l1BmfxPm+IgjG5Rk3AvHBJ49VX 3Jfaj+fR4NcVijFcA+CzQ== Message-ID: <4D81DA3B.6030909@restart.be> Date: Thu, 17 Mar 2011 10:54:03 +0100 From: Henri Hennebert Organization: RestartSoft User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9 MIME-Version: 1.0 To: eadler@FreeBSD.org References: <201103162106.p2GL6pwm062602@freefall.freebsd.org> In-Reply-To: <201103162106.p2GL6pwm062602@freefall.freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, ae@FreeBSD.org Subject: Re: kern/153552: [zfs] zfsboot from 8.2-RC1 freeze at boot time X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 09:54:05 -0000 On 03/16/2011 22:06, eadler@FreeBSD.org wrote: > Synopsis: [zfs] zfsboot from 8.2-RC1 freeze at boot time > > State-Changed-From-To: open->patched > State-Changed-By: eadler > State-Changed-When: Wed Mar 16 16:06:45 EST 2011 > State-Changed-Why: > committed in r219702 > > > Responsible-Changed-From-To: freebsd-fs->ae > Responsible-Changed-By: eadler > Responsible-Changed-When: Wed Mar 16 16:06:45 EST 2011 > Responsible-Changed-Why: > committed in r219702 > > http://www.freebsd.org/cgi/query-pr.cgi?pr=153552 > My problem is solved! Thank you Henri From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 11:37:49 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 69D5C1065679 for ; Thu, 17 Mar 2011 11:37:49 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id D65228FC17 for ; Thu, 17 Mar 2011 11:37:48 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p2HBbiRS084762 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 17 Mar 2011 13:37:44 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p2HBbiNL069483; Thu, 17 Mar 2011 13:37:44 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p2HBbihk069482; Thu, 17 Mar 2011 13:37:44 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 17 Mar 2011 13:37:44 +0200 From: Kostik Belousov To: luke@hybrid-logic.co.uk Message-ID: <20110317113744.GT78089@deviant.kiev.zoral.com.ua> References: <1300334881.3837.126.camel@pow> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="8ck+19+G1KZxmYE5" Content-Disposition: inline In-Reply-To: <1300334881.3837.126.camel@pow> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: Guaranteed kernel panic with ZFS + nullfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 11:37:49 -0000 --8ck+19+G1KZxmYE5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable [Enormous Cc: list trimmed down] On Thu, Mar 17, 2011 at 12:08:01AM -0400, Luke Marsden wrote: > Hi all, >=20 > The following script seems to cause a guaranteed kernel panic on 8.1-R, > 8.2-R and 8-STABLE as of today (2011-03-16), with both ZFS v14/15, and > v28 on 8.2-R with mm@ patches from 2011-03. I suspect it may also affect > 9-CURRENT but have not tested this yet. >=20 > #!/usr/local/bin/bash > export POOL=3Dhpool # change this to your pool name > sudo zfs destroy -r $POOL/foo > sudo zfs create $POOL/foo > sudo zfs set mountpoint=3D/foo $POOL/foo > sudo mount -t nullfs /foo /bar > sudo touch /foo/baz > ls /bar # should see baz > sudo zfs umount -f $POOL/foo # seems okay (ls: /bar: Bad file > descriptor) > sudo zfs mount $POOL/foo # PANIC! >=20 > Can anyone suggest a patch which fixes this? Preferably against > 8-STABLE :-) Please show the backtrace. --8ck+19+G1KZxmYE5 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk2B8ocACgkQC3+MBN1Mb4gijwCgwwMrkZTS63WrcITbR8X9XNYT 4QgAoOnm/8+rlNvBI/OPSx1aUFlQd5pV =cO+R -----END PGP SIGNATURE----- --8ck+19+G1KZxmYE5-- From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 11:47:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 95061106566C for ; Thu, 17 Mar 2011 11:47:48 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 469F18FC0C for ; Thu, 17 Mar 2011 11:47:47 +0000 (UTC) Received: by qwc9 with SMTP id 9so2176096qwc.13 for ; Thu, 17 Mar 2011 04:47:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:from :date:x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=aHXgDN9HMeB0w/3jbFCk7uLJpXCK+NQTACEcKtZcYic=; b=CQxVhppg2PICbeaRk9J+l4oWwoEuywex9xR9MW8favi/WmlykLkxMdP6f890EzdF/k bxc5ESdi+tORFPLkSJm7BM+k8AB95ftVeI6CbjTPv+rXHj+R5hKlrlJAtumo+YLfwtoc dc/BXcMYNK7rBGxPvvrFJ7CrkN5zf5Qku2OKU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; b=DYE4vsr6Xls80KoiDjeKor0Lvp5qewPh3Z3yl2LG17JR/ju/5YTnnlQxAZuKe3ZysV amzdmSoqECrOxmoW5tLj4igpmNR1nNgQmzTuvMbMeJjz5doFZ0UbS38VbsEoIkZ9YCYV nqXyAkF9phd14RWv4j6TVTWrKQSgTJftdtXEE= Received: by 10.229.111.225 with SMTP id t33mr971072qcp.61.1300360914093; Thu, 17 Mar 2011 04:21:54 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.229.78.193 with HTTP; Thu, 17 Mar 2011 04:21:14 -0700 (PDT) In-Reply-To: <20110317065720.GA49199@blazingdot.com> References: <4D7F7E33.7050103@yellowspace.net> <4D80BFB3.20706@yellowspace.net> <20110317065720.GA49199@blazingdot.com> From: Ivan Voras Date: Thu, 17 Mar 2011 12:21:14 +0100 X-Google-Sender-Auth: Bo5k6wDvX7P_8SbopHmAV3lbKio Message-ID: To: Marcus Reid Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: gmirror performance X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 11:47:48 -0000 On 17 March 2011 07:57, Marcus Reid wrote: > > Wow, that's great. =C2=A0I just almost doubled big sequential read perfor= mance > on one of my machines with this too. =C2=A0The question now is why the de= faults > are the way they are... =C2=A0Does a big vfs.read_max (described as "Clus= ter > read-ahead max block count") pessimize performance in some other way? Note that it will only help sequential reads. If you have a database, e-mail server or any other random IO load, it will not help you one bit. On the other hand, it will also not harm performance in that type of environments. It's an old tunable which has not been properly investigated ever since it was last modified 10 years ago; this and few other obscure file system tunables (hi/lo runningspace, dirhash_maxmem) will be much more sanely tuned for 9.0. From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 12:34:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9B4E106564A for ; Thu, 17 Mar 2011 12:34:43 +0000 (UTC) (envelope-from luke@digital-crocus.com) Received: from mail.digital-crocus.com (node2.digital-crocus.com [91.209.244.128]) by mx1.freebsd.org (Postfix) with ESMTP id 5D04D8FC13 for ; Thu, 17 Mar 2011 12:34:43 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector; d=hybrid-logic.co.uk; h=Received:Received:Subject:From:Reply-To:To:Cc:In-Reply-To:References:Content-Type:Organization:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse; b=TroqHugbCGdpPfsevXsHAX3Z6pg139tm0DDspkfynBTeDEHckbyV4g8J7+/mFSpHB8zUvzSXMzMHy2KwEdkxnMoI3BOcDRQ1PVcCtAXoKOp4TLOjs3t1OJ0QP/1Ter+a; Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1Q0COy-000N6K-HT for freebsd-fs@freebsd.org; Thu, 17 Mar 2011 12:34:04 +0000 Received: from c-76-118-178-109.hsd1.ma.comcast.net ([76.118.178.109] helo=[192.168.1.15]) by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD)) (envelope-from ) id 1Q0COy-000N64-1z; Thu, 17 Mar 2011 12:34:04 +0000 From: Luke Marsden To: Kostik Belousov In-Reply-To: <20110317113744.GT78089@deviant.kiev.zoral.com.ua> References: <1300334881.3837.126.camel@pow> <20110317113744.GT78089@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset="UTF-8" Organization: Hybrid Web Cluster Date: Thu, 17 Mar 2011 08:34:40 -0400 Message-ID: <1300365280.3837.129.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Digital-Crocus-Maillimit: done X-Authenticated-Sender: luke X-Complaints: abuse@digital-crocus.com X-Admin: admin@digital-crocus.com X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse reports) Cc: freebsd-fs@freebsd.org Subject: Re: Guaranteed kernel panic with ZFS + nullfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: luke@hybrid-logic.co.uk List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 12:34:43 -0000 On Thu, 2011-03-17 at 13:37 +0200, Kostik Belousov wrote: > [Enormous Cc: list trimmed down] > On Thu, Mar 17, 2011 at 12:08:01AM -0400, Luke Marsden wrote: > > Hi all, > > > > The following script seems to cause a guaranteed kernel panic on 8.1-R, > > 8.2-R and 8-STABLE as of today (2011-03-16), with both ZFS v14/15, and > > v28 on 8.2-R with mm@ patches from 2011-03. I suspect it may also affect > > 9-CURRENT but have not tested this yet. > > > > #!/usr/local/bin/bash > > export POOL=hpool # change this to your pool name > > sudo zfs destroy -r $POOL/foo > > sudo zfs create $POOL/foo > > sudo zfs set mountpoint=/foo $POOL/foo > > sudo mount -t nullfs /foo /bar > > sudo touch /foo/baz > > ls /bar # should see baz > > sudo zfs umount -f $POOL/foo # seems okay (ls: /bar: Bad file > > descriptor) > > sudo zfs mount $POOL/foo # PANIC! > > > > Can anyone suggest a patch which fixes this? Preferably against > > 8-STABLE :-) > Please show the backtrace. > Here you go: http://lukemarsden.net/zfs-panic-1.png http://lukemarsden.net/zfs-panic-2.png Thank you! -- Best Regards, Luke Marsden CTO, Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting Phone: +441172232002 / +16179496062 From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 13:41:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D36A6106564A; Thu, 17 Mar 2011 13:41:48 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id AD2DB8FC08; Thu, 17 Mar 2011 13:41:47 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA26109; Thu, 17 Mar 2011 15:41:43 +0200 (EET) (envelope-from avg@freebsd.org) Message-ID: <4D820F97.5090201@freebsd.org> Date: Thu, 17 Mar 2011 15:41:43 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110309 Lightning/1.0b2 Thunderbird/3.1.9 MIME-Version: 1.0 To: luke@hybrid-logic.co.uk References: <1300334881.3837.126.camel@pow> <20110317113744.GT78089@deviant.kiev.zoral.com.ua> <1300365280.3837.129.camel@pow> In-Reply-To: <1300365280.3837.129.camel@pow> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Guaranteed kernel panic with ZFS + nullfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 13:41:48 -0000 on 17/03/2011 14:34 Luke Marsden said the following: > On Thu, 2011-03-17 at 13:37 +0200, Kostik Belousov wrote: >> [Enormous Cc: list trimmed down] >> On Thu, Mar 17, 2011 at 12:08:01AM -0400, Luke Marsden wrote: >>> Hi all, >>> >>> The following script seems to cause a guaranteed kernel panic on 8.1-R, >>> 8.2-R and 8-STABLE as of today (2011-03-16), with both ZFS v14/15, and >>> v28 on 8.2-R with mm@ patches from 2011-03. I suspect it may also affect >>> 9-CURRENT but have not tested this yet. >>> >>> #!/usr/local/bin/bash >>> export POOL=hpool # change this to your pool name >>> sudo zfs destroy -r $POOL/foo >>> sudo zfs create $POOL/foo >>> sudo zfs set mountpoint=/foo $POOL/foo >>> sudo mount -t nullfs /foo /bar >>> sudo touch /foo/baz >>> ls /bar # should see baz >>> sudo zfs umount -f $POOL/foo # seems okay (ls: /bar: Bad file >>> descriptor) I believe that it's a bad idea to forcefully unmount a filesystem under a nullfs mount. Without -f the unmounting wouldn't succeed? >>> sudo zfs mount $POOL/foo # PANIC! >>> >>> Can anyone suggest a patch which fixes this? Preferably against >>> 8-STABLE :-) >> Please show the backtrace. >> > > Here you go: > > http://lukemarsden.net/zfs-panic-1.png > http://lukemarsden.net/zfs-panic-2.png IMO this is expected. I am not sure if this is ZFS specific or if it can happen with any kind of an underlying filesystem. Maybe Edward would be interested in fixing this behavior? :) -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 13:50:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 448FC1065673 for ; Thu, 17 Mar 2011 13:50:28 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id ED5F08FC14 for ; Thu, 17 Mar 2011 13:50:27 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1Q0Das-0007wS-6N for freebsd-fs@freebsd.org; Thu, 17 Mar 2011 14:50:26 +0100 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 17 Mar 2011 14:50:26 +0100 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 17 Mar 2011 14:50:26 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Thu, 17 Mar 2011 14:50:07 +0100 Lines: 24 Message-ID: References: <20110317071618.GB49199@blazingdot.com> <20110317074558.GA2248@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101102 Thunderbird/3.1.6 In-Reply-To: <20110317074558.GA2248@icarus.home.lan> X-Enigmail-Version: 1.1.2 Subject: Re: ZFS vfs.zfs.cache_flush_disable and ZIL reliability X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 13:50:28 -0000 On 17/03/2011 08:45, Jeremy Chadwick wrote: > General question to users and/or developers: > > Can someone please explain to me why people are so horribly focused (I > would go as far to say OCD) on this topic? For me, it's a matter of statistics; if this can improve the chances of data surviving by (just guessing here) 10%, it might be worth it. > Won't there *always* be some degree of potential loss of data in the > above two circumstances? Shouldn't the concern be less about "how much > data just got lost" and more about "is the filesystem actually usable > and clean/correct?" (ZFS implements the latter two assuming you're > using mirror or raidz). As an admin, I'd much rather have a file system that's clean with some data lost, but as a user I think I would be unhappy with any data loss :) Backups still rule. ZFS is covered (presumably, as I don't really get from the code how it's supposed to work - any clarification for pjd@, mm@ and others would be appreciated) and I've talked with McKusick about BIO_FLUSH in UFS and it will happen as soon as I have the time to follow up. From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 20:02:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 93884106566B for ; Thu, 17 Mar 2011 20:02:06 +0000 (UTC) (envelope-from tom@claimlynx.com) Received: from na3sys009aog112.obsmtp.com (na3sys009aog112.obsmtp.com [74.125.149.207]) by mx1.freebsd.org (Postfix) with ESMTP id 2FECA8FC16 for ; Thu, 17 Mar 2011 20:02:05 +0000 (UTC) Received: from source ([209.85.212.52]) (using TLSv1) by na3sys009aob112.postini.com ([74.125.148.12]) with SMTP ID DSNKTYJovR29nMqu1SHeNNCQqxdRAAFdkf5v@postini.com; Thu, 17 Mar 2011 13:02:06 PDT Received: by vws16 with SMTP id 16so3619437vws.11 for ; Thu, 17 Mar 2011 13:02:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.0.41 with SMTP id 9mr154756vdb.212.1300390364593; Thu, 17 Mar 2011 12:32:44 -0700 (PDT) Received: by 10.52.157.229 with HTTP; Thu, 17 Mar 2011 12:32:44 -0700 (PDT) Date: Thu, 17 Mar 2011 14:32:44 -0500 Message-ID: From: Thomas Johnson To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: HAST + ZFS causes system to shutdown uncleanly? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 20:02:06 -0000 Has anyone else noticed issues halting a system that is configured with a ZFS filesystem on a HAST device? I am using HAST to replicate a ZFS filesystem between two ESXi virtual machines (trying to emulate our production systems in a test environment) and I've noticed that the system doesn't seem to shutdown completely in this arrangement (hangs after "" message). I did some poking around and learned that if I unmount my zfs filesystems before shutdown, the shutdown finishes cleanly. Muddling my way through the rc scripts, it looks like hastd is killed fairly early on in the shutdown sequence. Presumably this is preventing the system from syncing/unmounting the ZFS mounts, causing the shutdown to hang. Does this seem plausible? If so, any ideas on fix, besides making sure I 'zfs unmount -a' before shutdown? -- Thomas Johnson From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 20:36:14 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CC0D21065673 for ; Thu, 17 Mar 2011 20:36:14 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 8A6B68FC13 for ; Thu, 17 Mar 2011 20:36:14 +0000 (UTC) Received: by gwb15 with SMTP id 15so1463269gwb.13 for ; Thu, 17 Mar 2011 13:36:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=awjZ16NZUcPNnn2JjKu4E5EQ3Za4iduLvB3HH8yjOfY=; b=jzZK0TpcgxO6Gce5jaN2kILs0TionnJu+R1uooSP0VSiv58LhQoeDv/jWzY0GryJgA MXHsmWVyGbnOjnOM3vM6O4CaYTLK+H5ZADYtFe5iJXFmx95SnvDvCUvrhhE5rYyz4TM9 79IJK7+zIuC9cYQzQYSsYL/M6NVPHie/7E97E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=i0DB0JMwro4g5kc/Eb9VMZch9+jlZ7GWycvzKZ/ChbtoO4dVPaLA27VIg3J69onGFA +fbb/+aUEAy3VWNlSpfN+LrxBZpltn2KylrVmQebUEIDjaUgpEbfhR2irNpw46Ef2643 /YLqEZWz4SrT11zUPOU9aywBQTDL9fdZygH8s= MIME-Version: 1.0 Received: by 10.90.126.19 with SMTP id y19mr295423agc.114.1300394173829; Thu, 17 Mar 2011 13:36:13 -0700 (PDT) Received: by 10.90.83.18 with HTTP; Thu, 17 Mar 2011 13:36:13 -0700 (PDT) In-Reply-To: References: Date: Thu, 17 Mar 2011 13:36:13 -0700 Message-ID: From: Freddie Cash To: Thomas Johnson Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: HAST + ZFS causes system to shutdown uncleanly? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 20:36:14 -0000 On Thu, Mar 17, 2011 at 12:32 PM, Thomas Johnson wrote: > Has anyone else noticed issues halting a system that is configured with a > ZFS filesystem on a HAST device? I am using HAST to replicate a ZFS > filesystem between two ESXi virtual machines (trying to emulate our > production systems in a test environment) and I've noticed that the system > doesn't seem to shutdown completely in this arrangement (hangs after "" > message). I did some poking around and learned that if I unmount my zfs > filesystems before shutdown, the shutdown finishes cleanly. Muddling my way > through the rc scripts, it looks like hastd is killed fairly early on in the > shutdown sequence. Presumably this is preventing the system from > syncing/unmounting the ZFS mounts, causing the shutdown to hang. > > Does this seem plausible? If so, any ideas on fix, besides making sure I > 'zfs unmount -a' before shutdown? Does it work if you manually add "hastd" to the REQUIRE: line in /etc/rc.d/zfs? Of course, that only works if you are starting zfs automatically via /etc/rc.conf, and not letting CARP/devd or something else manage the pool import process. -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 20:42:10 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E485106566B for ; Thu, 17 Mar 2011 20:42:10 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 3B36F8FC08 for ; Thu, 17 Mar 2011 20:42:09 +0000 (UTC) Received: by yxl31 with SMTP id 31so1470327yxl.13 for ; Thu, 17 Mar 2011 13:42:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=NRy2EdMLHmLAlRzQHuF8RCjIgY5P6wcyfjD4gY924B8=; b=ljX36Wps+Ct+YYXS8Mo+SIHJ/MR3p8IsP6i/L5HRIM2ztYzo6vXweusi03/bcUG0Eg AQQFxIoo7b2NAUEoptjzajpYu+0L3evazaDby8jpiAaQ6xHA9gAt0rkb2Gmd6kE1Uf1j waXMY+sycqWYv2sX8wAidz1Az1pFjpO0vn9wA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=yCCSViQ+na9Ug5d4CVNFJAdsd875URaRvAR4pdJjU+4bXB4kglIkIduOidUTBhf41N fW1BNAuf6VeQGl6prPxEOrwKUlHEx109XaaRCYqurxraCEBJvuFt29cPkTBjDeqkyvhy qWjH5AibAZxAR0tMz6vVLhrvxbx0rvE9QUp54= MIME-Version: 1.0 Received: by 10.91.79.12 with SMTP id g12mr283561agl.168.1300394529289; Thu, 17 Mar 2011 13:42:09 -0700 (PDT) Received: by 10.90.83.18 with HTTP; Thu, 17 Mar 2011 13:42:09 -0700 (PDT) In-Reply-To: References: Date: Thu, 17 Mar 2011 13:42:09 -0700 Message-ID: From: Freddie Cash To: Thomas Johnson Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: HAST + ZFS causes system to shutdown uncleanly? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 20:42:10 -0000 On Thu, Mar 17, 2011 at 1:36 PM, Freddie Cash wrote: > On Thu, Mar 17, 2011 at 12:32 PM, Thomas Johnson wrote: >> Has anyone else noticed issues halting a system that is configured with a >> ZFS filesystem on a HAST device? I am using HAST to replicate a ZFS >> filesystem between two ESXi virtual machines (trying to emulate our >> production systems in a test environment) and I've noticed that the system >> doesn't seem to shutdown completely in this arrangement (hangs after "" >> message). I did some poking around and learned that if I unmount my zfs >> filesystems before shutdown, the shutdown finishes cleanly. Muddling my way >> through the rc scripts, it looks like hastd is killed fairly early on in the >> shutdown sequence. Presumably this is preventing the system from >> syncing/unmounting the ZFS mounts, causing the shutdown to hang. >> >> Does this seem plausible? If so, any ideas on fix, besides making sure I >> 'zfs unmount -a' before shutdown? > > Does it work if you manually add "hastd" to the REQUIRE: line in /etc/rc.d/zfs? > > Of course, that only works if you are starting zfs automatically via > /etc/rc.conf, and not letting CARP/devd or something else manage the > pool import process. Thinking about it, perhaps we need a hook into the top of the hastd_stop_precmd() function in /etc/rc.d/hastd? Something like "hastd_stop_args" in /etc/rc.conf where we can put commands to be run before hastd is stopped? Then it would be as simple as putting hastd_stop_args="zfs unmount -a" into /etc/rc.conf. Or something along those lines, so that we stop any consumers of the /dev/hast/* devices before we stop the hast daemon. -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Thu Mar 17 21:00:41 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D0EB21065676 for ; Thu, 17 Mar 2011 21:00:41 +0000 (UTC) (envelope-from tom@claimlynx.com) Received: from na3sys009aog112.obsmtp.com (na3sys009aog112.obsmtp.com [74.125.149.207]) by mx1.freebsd.org (Postfix) with ESMTP id 3885C8FC08 for ; Thu, 17 Mar 2011 21:00:41 +0000 (UTC) Received: from source ([209.85.212.43]) (using TLSv1) by na3sys009aob112.postini.com ([74.125.148.12]) with SMTP ID DSNKTYJ2durARgLoAqBLjFLu2UmTxF+2TEsX@postini.com; Thu, 17 Mar 2011 14:00:41 PDT Received: by mail-vw0-f43.google.com with SMTP id 10so3581440vws.30 for ; Thu, 17 Mar 2011 14:00:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.70.49 with SMTP id j17mr318989vdu.292.1300395638288; Thu, 17 Mar 2011 14:00:38 -0700 (PDT) Received: by 10.52.157.229 with HTTP; Thu, 17 Mar 2011 14:00:38 -0700 (PDT) In-Reply-To: References: Date: Thu, 17 Mar 2011 16:00:38 -0500 Message-ID: From: Thomas Johnson To: Freddie Cash Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: HAST + ZFS causes system to shutdown uncleanly? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2011 21:00:41 -0000 (replying again with the list CCd) Adding the hastd to the REQUIRE in zfs does not have any effect; although I'm not even sure if /etc/rc.d/zfs gets called during shutdown ('rcorder -k shutdown /etc/rc.d/*' would seem to indicate that it does not. I am using devd/CARP to manage my pools, but it seems to me that if the zfs rc script were running on shutdown it would handle this case properly, since the zfs script appears to simply run a 'zfs unmount -a'. I did add/test with enable_zfs=YES in my rc.conf, to no avail. A shutdown hook was my thought too. Also, to clarify an omission in my initial email, the vm hangs after the "All buffers synced." message on shutdown. On Thu, Mar 17, 2011 at 3:36 PM, Freddie Cash wrote: > > > Does it work if you manually add "hastd" to the REQUIRE: line in > /etc/rc.d/zfs? > > Of course, that only works if you are starting zfs automatically via > /etc/rc.conf, and not letting CARP/devd or something else manage the > pool import process. > > From owner-freebsd-fs@FreeBSD.ORG Fri Mar 18 12:23:17 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 26D81106566C for ; Fri, 18 Mar 2011 12:23:17 +0000 (UTC) (envelope-from a.smith@ukgrid.net) Received: from mx0.ukgrid.net (mx0.ukgrid.net [89.21.28.41]) by mx1.freebsd.org (Postfix) with ESMTP id E3C208FC08 for ; Fri, 18 Mar 2011 12:23:16 +0000 (UTC) Received: from [89.21.28.38] (port=53458 helo=omicron.ukgrid.net) by mx0.ukgrid.net with esmtp (Exim 4.74; FreeBSD) envelope-from a.smith@ukgrid.net id 1Q0Yhz-0003VN-E1; Fri, 18 Mar 2011 12:23:11 +0000 Received: from voip.ukgrid.net (voip.ukgrid.net [89.107.16.9]) by webmail2.ukgrid.net (Horde Framework) with HTTP; Fri, 18 Mar 2011 12:23:11 +0000 Message-ID: <20110318122311.13302c8sfl73pzfo@webmail2.ukgrid.net> Date: Fri, 18 Mar 2011 12:23:11 +0000 From: a.smith@ukgrid.net To: tom@claimlynx.com MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) H3 (4.3.9) / FreeBSD-8.1 Cc: freebsd-fs@freebsd.org Subject: RE: HAST + ZFS causes system to shutdown uncleanly? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2011 12:23:17 -0000 > I am using HAST to replicate a ZFS > filesystem between two ESXi virtual machines (trying to emulate our > production systems in a test environment) If your goal is just to replicate your data to a test environment is there a reason you are not using ZFS send/receive over Ssh rather than HAST? It would be a simpler configuration, and therefore should be less error prone IMO. thanks Andy. From owner-freebsd-fs@FreeBSD.ORG Fri Mar 18 13:21:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB9371065686 for ; Fri, 18 Mar 2011 13:21:43 +0000 (UTC) (envelope-from tom@claimlynx.com) Received: from na3sys009aog117.obsmtp.com (na3sys009aog117.obsmtp.com [74.125.149.242]) by mx1.freebsd.org (Postfix) with ESMTP id 733518FC14 for ; Fri, 18 Mar 2011 13:21:43 +0000 (UTC) Received: from source ([209.85.220.176]) (using TLSv1) by na3sys009aob117.postini.com ([74.125.148.12]) with SMTP ID DSNKTYNcZt7Tq3HVMwnHVChkCdBc4XoQpx+m@postini.com; Fri, 18 Mar 2011 06:21:43 PDT Received: by mail-vx0-f176.google.com with SMTP id 40so4218205vxi.21 for ; Fri, 18 Mar 2011 06:21:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.98.234 with SMTP id el10mr1530439vdb.132.1300454502024; Fri, 18 Mar 2011 06:21:42 -0700 (PDT) Received: by 10.52.157.229 with HTTP; Fri, 18 Mar 2011 06:21:41 -0700 (PDT) In-Reply-To: <20110318122311.13302c8sfl73pzfo@webmail2.ukgrid.net> References: <20110318122311.13302c8sfl73pzfo@webmail2.ukgrid.net> Date: Fri, 18 Mar 2011 08:21:41 -0500 Message-ID: From: Thomas Johnson To: a.smith@ukgrid.net Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: HAST + ZFS causes system to shutdown uncleanly? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2011 13:21:43 -0000 The idea is not to replicate data from production to testing, I probably didn't explain that very well. In our production network, we have a pair of NFS servers attached to a Dell MD3000 disk cabinet. Since I can't directly recreate that setup in ESXi, HAST seems like a good alternative, since it gives me failover access to the same data on my virtualized NFS heads. On Fri, Mar 18, 2011 at 7:23 AM, wrote: > I am using HAST to replicate a ZFS >> filesystem between two ESXi virtual machines (trying to emulate our >> production systems in a test environment) >> > > If your goal is just to replicate your data to a test environment is there > a reason you are not using ZFS send/receive over Ssh rather than HAST? It > would be a simpler configuration, and therefore should be less error prone > IMO. > > thanks Andy. > > > > -- Thomas Johnson ClaimLynx, Inc. 952-593-5969 x2302 From owner-freebsd-fs@FreeBSD.ORG Fri Mar 18 13:53:10 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4BA03106564A for ; Fri, 18 Mar 2011 13:53:10 +0000 (UTC) (envelope-from a.smith@ukgrid.net) Received: from mx0.ukgrid.net (mx0.ukgrid.net [89.21.28.41]) by mx1.freebsd.org (Postfix) with ESMTP id 132C28FC14 for ; Fri, 18 Mar 2011 13:53:09 +0000 (UTC) Received: from [89.21.28.38] (port=15104 helo=omicron.ukgrid.net) by mx0.ukgrid.net with esmtp (Exim 4.74; FreeBSD) envelope-from a.smith@ukgrid.net id 1Q0a72-0005wZ-EU; Fri, 18 Mar 2011 13:53:08 +0000 Received: from voip.ukgrid.net (voip.ukgrid.net [89.107.16.9]) by webmail2.ukgrid.net (Horde Framework) with HTTP; Fri, 18 Mar 2011 13:53:08 +0000 Message-ID: <20110318135308.17082qlymxou8des@webmail2.ukgrid.net> Date: Fri, 18 Mar 2011 13:53:08 +0000 From: a.smith@ukgrid.net To: Thomas Johnson References: <20110318122311.13302c8sfl73pzfo@webmail2.ukgrid.net> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) H3 (4.3.9) / FreeBSD-8.1 Cc: freebsd-fs@freebsd.org Subject: Re: HAST + ZFS causes system to shutdown uncleanly? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2011 13:53:10 -0000 Quoting Thomas Johnson : > The idea is not to replicate data from production to testing, I probably > didn't explain that very well. In our production network, we have a pair of > NFS servers attached to a Dell MD3000 disk cabinet. Since I can't directly > recreate that setup in ESXi, HAST seems like a good alternative, since it > gives me failover access to the same data on my virtualized NFS heads. > Ah ok, well depending on your requirement ZFS send/receive might still work with a bit of scripting, but HAST in theory should be simpler to setup I guess. Anohter option would be a third ESX virtual host running an iSCSI target to provide shared storage to the two test NFS servers... ta Andy. From owner-freebsd-fs@FreeBSD.ORG Fri Mar 18 20:59:52 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02AD5106566B for ; Fri, 18 Mar 2011 20:59:52 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 8624E8FC18 for ; Fri, 18 Mar 2011 20:59:51 +0000 (UTC) Received: by fxm11 with SMTP id 11so4743733fxm.13 for ; Fri, 18 Mar 2011 13:59:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:references:x-comment-to:date :in-reply-to:message-id:user-agent:mime-version:content-type; bh=RXV4nQQwnJiVEhwgmmisZBgrDjjnSSq1DRD4vdoPDY8=; b=g16ZT2TRxEtYwGvfGcC9b8PfC4ZjFKJEE+gKVbpRR6Ug7mi6cj6OBiQ4hg+wnzh/H/ xeZPzZE3CdfvgrT9ZApk06d24f8moAHI8fXg1knmvDd1653qW5aQDvM9T2Jm+0DclQLL nlh0Wa5pq53FtTnOMAf8bx+Q0JdlqJJchqoLI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=ckktDSTOPLhNEcJFxAXa5ezKEXdSP5bPhEV98xTQb5nK8A0b/aKwT1d+xxkckMH36L I+xWh4rE6eq8tiw6QnEJ4yr8tC12qd/cUrdkO4CXeeEX+epaGtOYlj2pAxxCPdIaxBkM 2SFLe0fu+Na/vVJHq6L+/7YdPyoaHQMcNnZ7k= Received: by 10.223.16.138 with SMTP id o10mr1820651faa.88.1300481990338; Fri, 18 Mar 2011 13:59:50 -0700 (PDT) Received: from localhost ([95.69.172.154]) by mx.google.com with ESMTPS id c11sm1542173fav.26.2011.03.18.13.59.48 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 18 Mar 2011 13:59:48 -0700 (PDT) From: Mikolaj Golub To: Freddie Cash References: X-Comment-To: Freddie Cash Date: Fri, 18 Mar 2011 22:59:47 +0200 In-Reply-To: (Freddie Cash's message of "Thu, 17 Mar 2011 13:42:09 -0700") Message-ID: <8662rgrvp8.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@freebsd.org Subject: Re: HAST + ZFS causes system to shutdown uncleanly? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2011 20:59:52 -0000 On Thu, 17 Mar 2011 13:42:09 -0700 Freddie Cash wrote: FC> On Thu, Mar 17, 2011 at 1:36 PM, Freddie Cash wrote: >> On Thu, Mar 17, 2011 at 12:32 PM, Thomas Johnson wrote: >>> Has anyone else noticed issues halting a system that is configured with a >>> ZFS filesystem on a HAST device? I am using HAST to replicate a ZFS >>> filesystem between two ESXi virtual machines (trying to emulate our >>> production systems in a test environment) and I've noticed that the system >>> doesn't seem to shutdown completely in this arrangement (hangs after "" >>> message). I did some poking around and learned that if I unmount my zfs >>> filesystems before shutdown, the shutdown finishes cleanly. Muddling my way >>> through the rc scripts, it looks like hastd is killed fairly early on in the >>> shutdown sequence. Presumably this is preventing the system from >>> syncing/unmounting the ZFS mounts, causing the shutdown to hang. >>> >>> Does this seem plausible? If so, any ideas on fix, besides making sure I >>> 'zfs unmount -a' before shutdown? >> >> Does it work if you manually add "hastd" to the REQUIRE: line in /etc/rc.d/zfs? >> >> Of course, that only works if you are starting zfs automatically via >> /etc/rc.conf, and not letting CARP/devd or something else manage the >> pool import process. FC> Thinking about it, perhaps we need a hook into the top of the FC> hastd_stop_precmd() function in /etc/rc.d/hastd? FC> Something like "hastd_stop_args" in /etc/rc.conf where we can put FC> commands to be run before hastd is stopped? FC> Then it would be as simple as putting hastd_stop_args="zfs unmount -a" FC> into /etc/rc.conf. FC> Or something along those lines, so that we stop any consumers of the FC> /dev/hast/* devices before we stop the hast daemon. IMHO, it is not HAST job to bother with such things. We always have something (heartbeat, carp, hastmon) to manage HAST (change role, mount fs, start applications). This something has it own rc scripts, on startup it sets roles and mounts fs (if needed) and on shutdown it should do all necessary cleanup. -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Fri Mar 18 21:11:20 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B518E106564A for ; Fri, 18 Mar 2011 21:11:20 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6F89D8FC18 for ; Fri, 18 Mar 2011 21:11:20 +0000 (UTC) Received: by gyg13 with SMTP id 13so1995153gyg.13 for ; Fri, 18 Mar 2011 14:11:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=cA48Hq96JCZeCG4QagcznGp4nw788fBRmhZ9YcXB1Ro=; b=hb+a/lfYCkhHKZNC6tCpGjBwdNPdxgcD5OhaioSq35uc0K/hkGZkCIVVTiEPrDgpVc 3tp2AsaSMZqMtkAWDg8eR3BppsbmQGdEvlACK2ndO3hjQmeTw1PNpF7Q/39G2XOz/C9N IaGHBRQJ6G3TnuybyKXpwI1GA90qJ4meNpL1s= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=YIzuSrrs2IxEOS2OsO4TTcM6r7DDWUL9ztPCNpQewosFJNF+nfT52VPUdLxR41F6a1 1mGWjlfRSdkMouNoBppophvl/fcOihaIxwX0q5wZhytxLd9f253nAReCzRortpvB8FBO UmddquoKX86Qsaz+5pK4DVgFjmFQVhFvwf8gw= MIME-Version: 1.0 Received: by 10.90.126.19 with SMTP id y19mr1668392agc.114.1300482679617; Fri, 18 Mar 2011 14:11:19 -0700 (PDT) Received: by 10.90.83.18 with HTTP; Fri, 18 Mar 2011 14:11:19 -0700 (PDT) In-Reply-To: <8662rgrvp8.fsf@kopusha.home.net> References: <8662rgrvp8.fsf@kopusha.home.net> Date: Fri, 18 Mar 2011 14:11:19 -0700 Message-ID: From: Freddie Cash To: Mikolaj Golub Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: HAST + ZFS causes system to shutdown uncleanly? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2011 21:11:20 -0000 On Fri, Mar 18, 2011 at 1:59 PM, Mikolaj Golub wr= ote: > On Thu, 17 Mar 2011 13:42:09 -0700 Freddie Cash wrote: > =C2=A0FC> On Thu, Mar 17, 2011 at 1:36 PM, Freddie Cash wrote: > =C2=A0>> On Thu, Mar 17, 2011 at 12:32 PM, Thomas Johnson wrote: > =C2=A0>>> Has anyone else noticed issues halting a system that is configu= red with a > =C2=A0>>> ZFS filesystem on a HAST device? I am using HAST to replicate a= ZFS > =C2=A0>>> filesystem between two ESXi virtual machines (trying to emulate= our > =C2=A0>>> production systems in a test environment) and I've noticed that= the system > =C2=A0>>> doesn't seem to shutdown completely in this arrangement (hangs = after "" > =C2=A0>>> message). I did some poking around and learned that if I unmoun= t my zfs > =C2=A0>>> filesystems before shutdown, the shutdown finishes cleanly. Mud= dling my way > =C2=A0>>> through the rc scripts, it looks like hastd is killed fairly ea= rly on in the > =C2=A0>>> shutdown sequence. Presumably this is preventing the system fro= m > =C2=A0>>> syncing/unmounting the ZFS mounts, causing the shutdown to hang= . > =C2=A0>>> > =C2=A0>>> Does this seem plausible? If so, any ideas on fix, besides maki= ng sure I > =C2=A0>>> 'zfs unmount -a' before shutdown? > =C2=A0>> > =C2=A0>> Does it work if you manually add "hastd" to the REQUIRE: line in= /etc/rc.d/zfs? > =C2=A0>> > =C2=A0>> Of course, that only works if you are starting zfs automatically= via > =C2=A0>> /etc/rc.conf, and not letting CARP/devd or something else manage= the > =C2=A0>> pool import process. > > =C2=A0FC> Thinking about it, perhaps we need a hook into the top of the > =C2=A0FC> hastd_stop_precmd() function in /etc/rc.d/hastd? > > =C2=A0FC> Something like "hastd_stop_args" in /etc/rc.conf where we can p= ut > =C2=A0FC> commands to be run before hastd is stopped? > > =C2=A0FC> Then it would be as simple as putting hastd_stop_args=3D"zfs un= mount -a" > =C2=A0FC> into /etc/rc.conf. > > =C2=A0FC> Or something along those lines, so that we stop any consumers o= f the > =C2=A0FC> /dev/hast/* devices before we stop the hast daemon. > > IMHO, it is not HAST job to bother with such things. We always have somet= hing > (heartbeat, carp, hastmon) to manage HAST (change role, mount fs, start > applications). This something has it own rc scripts, on startup it sets r= oles > and mounts fs (if needed) and on shutdown it should do all necessary clea= nup. Unless I'm missing something here, this has nothing to do with shutting off the master node in a HAST setup, where the ZFS pool is mounted, when the slave node is already offline. As far as CARP, devd, heartbeat, etc are concerned, everything is up and running correctly. No need to unmount the pool, as it's not switching to slave mode. Or, are you suggesting that part of the "shutdown procedure" would be to switch it to slave first, then shutdown? --=20 Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Fri Mar 18 21:24:21 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E839106566B for ; Fri, 18 Mar 2011 21:24:21 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id C10E88FC0A for ; Fri, 18 Mar 2011 21:24:20 +0000 (UTC) Received: by fxm11 with SMTP id 11so4763669fxm.13 for ; Fri, 18 Mar 2011 14:24:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:references:x-comment-to:date :in-reply-to:message-id:user-agent:mime-version:content-type; bh=NYNjjfhuMPX0eTdRqnS0oKUECWyAUfil9pfXPItvp3U=; b=rzBc5Il5aUXjA6ncnj6wKhA9IgATW6C6dI4EK9b0K8xxb5Zv5fDJhS7IFwrrM4ZFXT BseRt82kMMgB95mc9al87JOKJCzUwMSltO67GV6/FhQY6TTRBqYhd45sUEW2l1/kZIuc FRfb0jVxEWKGFSAbm7QF8lXeSM9fsO18nzkno= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=taU6sltE2eERwPnrJInnUfO/pn9d2DyNTG7QIP0cWRzgO7TSN9pxuPxtOVsNhz3QVX ipCOFUnSuaEm2ruPeFgOCH4cKWPSVw5rsuY7QbsiIjCH4/UjuwD+qEr+JfcIdgdGVSgk s7+9oHjpI1N6iKz8+ILkekGkJ4CuO7oLW40IA= Received: by 10.223.64.201 with SMTP id f9mr1804809fai.102.1300483459903; Fri, 18 Mar 2011 14:24:19 -0700 (PDT) Received: from localhost ([95.69.172.154]) by mx.google.com with ESMTPS id e17sm636610fak.0.2011.03.18.14.24.18 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 18 Mar 2011 14:24:19 -0700 (PDT) From: Mikolaj Golub To: Freddie Cash References: <8662rgrvp8.fsf@kopusha.home.net> X-Comment-To: Freddie Cash Date: Fri, 18 Mar 2011 23:24:17 +0200 In-Reply-To: (Freddie Cash's message of "Fri, 18 Mar 2011 14:11:19 -0700") Message-ID: <861v24ruke.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@freebsd.org Subject: Re: HAST + ZFS causes system to shutdown uncleanly? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2011 21:24:21 -0000 On Fri, 18 Mar 2011 14:11:19 -0700 Freddie Cash wrote: FC> Or, are you suggesting that part of the "shutdown procedure" would be FC> to switch it to slave first, then shutdown? Yes, something like this. Slave or I would rather call it "init". In any case it should run the script that is called when the node is switched to secondary. -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Sat Mar 19 00:01:57 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A05DE106566B for ; Sat, 19 Mar 2011 00:01:57 +0000 (UTC) (envelope-from lopez.on.the.lists@yellowspace.net) Received: from mail.yellowspace.net (mail.yellowspace.net [80.190.192.217]) by mx1.freebsd.org (Postfix) with ESMTP id 3B6A98FC08 for ; Sat, 19 Mar 2011 00:01:56 +0000 (UTC) Received: from furia.intranet ([188.174.150.231]) (AUTH: CRAM-MD5 lopez.on.the.lists@yellowspace.net, SSL: TLSv1/SSLv3, 256bits, CAMELLIA256-SHA) by mail.yellowspace.net with esmtp; Sat, 19 Mar 2011 01:01:55 +0100 id 027C1803.000000004D83F273.0000EA1A Message-ID: <4D83F273.2070409@yellowspace.net> Date: Sat, 19 Mar 2011 01:01:55 +0100 From: Lorenzo Perone User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: HAST + ZFS causes system to shutdown uncleanly? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Mar 2011 00:01:57 -0000 On 17.03.11 22:00, Thomas Johnson wrote: > (replying again with the list CCd) > > Adding the hastd to the REQUIRE in zfs does not have any effect; although > I'm not even sure if /etc/rc.d/zfs gets called during shutdown ('rcorder -k > shutdown /etc/rc.d/*' would seem to indicate that it does not. I am using > devd/CARP to manage my pools, but it seems to me that if the zfs rc script > were running on shutdown it would handle this case properly, since the zfs > script appears to simply run a 'zfs unmount -a'. I did add/test with > enable_zfs=YES in my rc.conf, to no avail. > > A shutdown hook was my thought too. > > Also, to clarify an omission in my initial email, the vm hangs after the > "All buffers synced." message on shutdown. Now that I read this, I must add I've had the same thing lastly - in another situation unrelated to HAST. I'm not sure it is necessarily related, but if yes, it might be helpful to know: When testing the hot-pluggability of drives in a non-redundant zpool, I ran into the same situation: stuck at "All buffers synced.". I did an evil thing for testing: just plugged out one of the drives (in a non-redundant pool). This was noticed by the OS without panic (yepee! good news!), and also by the zpool status ('One or more devices are faulted in response to IO failures.' 'Make sure the affected devices are connected, then run 'zpool clear'.') Drives in zpool status were still all listed as ONLINE (not correct - but this might be also related to the underlying driver). After reinserting the drive, an attempt to zpool clear hung with the shell controlling it. I could do anything else, included shutdown -r now, but then I was stuck @ "All buffers synced." too. My case had nothing to do with HAST, but it looks like ZFS hangs here when it loses a vdev component? If it is the case, I wonder if it is to be filed as a bug (I mean, if we get so far as to sync all buffers, hell, let's reboot ;))? Note that my pool was perfectly okay after manual reset/reboot (as yours seems too). It was even already cleared so apparently the zpool clear had succeeded before hanging. BTW: I must really say that the number of subjects in this list containing a bad word like 'unclean', 'problem', 'crash', AND "ZFS" is way unjust (and mostly turns out being something elses'fault): ZFS ROCKS on FreeBSD when used with good hardware (in my small but heavy production experience of the last 2 years). Regards, Lorenzo From owner-freebsd-fs@FreeBSD.ORG Sat Mar 19 01:53:36 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70E721065670 for ; Sat, 19 Mar 2011 01:53:36 +0000 (UTC) (envelope-from freebsd@deman.com) Received: from plato.corp.nas.com (plato.corp.nas.com [66.114.32.138]) by mx1.freebsd.org (Postfix) with ESMTP id 31A758FC1F for ; Sat, 19 Mar 2011 01:53:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by plato.corp.nas.com (Postfix) with ESMTP id 1EB5DC8B18B5 for ; Fri, 18 Mar 2011 18:36:57 -0700 (PDT) X-Virus-Scanned: amavisd-new at corp.nas.com Received: from plato.corp.nas.com ([127.0.0.1]) by localhost (plato.corp.nas.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vTWtVmOIJpP7 for ; Fri, 18 Mar 2011 18:36:56 -0700 (PDT) Received: from [192.168.0.4] (184-100-233-81.ptld.qwest.net [184.100.233.81]) by plato.corp.nas.com (Postfix) with ESMTPSA id BE18DC8B189E for ; Fri, 18 Mar 2011 18:36:55 -0700 (PDT) From: Michael DeMan Date: Fri, 18 Mar 2011 18:36:55 -0700 References: <1A85CB12-53BB-4D38-8671-10378F6AE4EC@deman.com> To: freebsd-fs@freebsd.org Message-Id: <020081B1-A843-48CF-B79A-519BFC0A09FB@deman.com> Mime-Version: 1.0 (Apple Message framework v1082) X-Mailer: Apple Mail (2.1082) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Fwd: [zfs-discuss] [OpenIndiana-discuss] best migration path from Solaris 10 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Mar 2011 01:53:36 -0000 Hi All, If folks here are interested, let me know and I will get the data up = somewhere on the benchmarks I am running. Definitely neither scientific nor rigorous - just 'bonnie bash' some = older hardware and see what happens. I can full data on the hardware up = as well. Thanks, - mike Begin forwarded message: > From: Michael DeMan > Date: March 18, 2011 6:06:25 PM PDT > To: David Brodbeck > Cc: Discussion list for OpenIndiana = , "zfs-discuss@opensolaris.org = discuss" > Subject: Re: [zfs-discuss] [OpenIndiana-discuss] best migration path = from Solaris 10 >=20 > Hi David, >=20 > Caught your note about bonnie, actually do some testing myself over = the weekend. >=20 > All on older hardware for fun - dual opteron 285 with 16GB RAM. Disk = systems is off a pair of SuperMicro SATA cards, with a combination of WD = enterprise and Seagate ES 1TB drives. No ZIL, no L2ARC, no tuning at = all from base FreeNAS install. >=20 > 10 drives total, I'm going to be running tests as below, mostly = curious about IOPS and to sort out a little debate with a co-worker. >=20 > - all 10 in one raidz2 (running now) > - 5 by 2-way mirrors > - 2 by 5-disk raidz1 >=20 > Script is as below - if folks would find the data I collect be useful = information at all, let me know and I will post it publicly somewhere. >=20 >=20 >=20 >=20 > freenas# cat test.sh > #!/bin/sh >=20 > # Basic test for file I/O. We run lots and lots of the tradditional > # 'bonnie' tool at 50GB file size, starting one every minute. = Resulting > # data should give us a good work mixture in the middle given all the = different > # tests that bonnnie runs, 100 instances running at the same time, and = at different > # stages of their processing. >=20 >=20 > MAX=3D100 > COUNT=3D0 >=20 > FILESYSTEM=3Dtestrz2 > LOG=3D${FILESYSTEM}.log >=20 >=20 > date > ${LOG} > echo "Test with file system named ${FILESYSTEM} and Configuration = of..." >> ${LOG} > zpool status >> ${LOG} >=20 > # DEMAN grab zfs and regular dev iostats every 10 minutes during test > zpool iostat -v 600 >> ${LOG} & > iostat -w 600 ada0 ada1 ada2 ada3 ada4 ada5 ada6 ada7 ada8 ada9 > = ${LOG}.iostat &=20 >=20 >=20 > while [ $COUNT -le $MAX ]; do > echo kicking off bonnie > bonnie -d /mnt/${FILESYSTEM} -s 50000 & > sleep 60; > COUNT=3D$((count+1)); > done; >=20 >=20 >=20 > On Mar 18, 2011, at 3:26 PM, David Brodbeck wrote: >=20 >> I'm in a similar position, so I'll be curious what kinds of responses = you get. I can give you a thumbnail sketch of what I've looked at so = far: >>=20 >> I evaluated FreeBSD, and ruled it out because I need NFSv4, and = FreeBSD's NFSv4 support is still in an early stage. The NFS stability = and performance just isn't there yet, in my opinion. >>=20 >> Nexenta Core looked promising, but locked up in bonnie++ NFS testing = with our RedHat nodes, so its stability is a bit of a question mark for = me. >>=20 >> I haven't gotten the opportunity to thoroughly evaluate OpenIndiana, = yet. It's only available as a DVD ISO, and my test machine currently = has only a CD-ROM drive. Changing that is on my to-do list, but other = things keep slipping in ahead of it. >>=20 >> For now I'm running OpenSolaris, with a locally-compiled version of = Samba. (The OpenSolaris Samba package is very old and has several = unpatched security holes, at this point.) >>=20 >> --=20 >> David Brodbeck >> System Administrator, Linguistics >> University of Washington >>=20 >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >=20 > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss