From owner-freebsd-fs@FreeBSD.ORG Sun Aug 15 23:38:05 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DAE4B1065694 for ; Sun, 15 Aug 2010 23:38:05 +0000 (UTC) (envelope-from cryx-freebsd@h3q.com) Received: from mail.h3q.com (mail.h3q.com [213.73.89.199]) by mx1.freebsd.org (Postfix) with ESMTP id 2D2758FC1B for ; Sun, 15 Aug 2010 23:38:04 +0000 (UTC) Received: (qmail 67237 invoked from network); 15 Aug 2010 23:38:03 -0000 Received: from mail.h3q.com (HELO mail.h3q.com) (cryx) by mail.h3q.com with AES256-SHA encrypted SMTP; 15 Aug 2010 23:38:03 -0000 Message-ID: <4C687A5A.3060808@h3q.com> Date: Mon, 16 Aug 2010 01:38:02 +0200 From: Philipp Wuensche User-Agent: Postbox 1.1.5 (Macintosh/20100613) MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <20100813191521.GB2006@garage.freebsd.pl> In-Reply-To: <20100813191521.GB2006@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Converting sysinstalled FreeBSD into ZFS-only server. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Aug 2010 23:38:05 -0000 Pawel Jakub Dawidek wrote: > > I'm also open to comments on the layout I proposed. I use it for quite a > while now and I tried different ones before too, but this one I simply > find the best. If our official installer will support creating ZFS-only > install, I'll be forcing this layout, so if you think something is > _very_ wrong about it, let me know. NIce idea, using dummy-filesystems to get filesystems in the right position! But my fear is that the usage of dummy-filesystem like system/usr and system/var will confuse some users. If someone starts to mess around in /var/db or some other subdirectory in /usr or /var and thinks he is safe by taking a snapshot of /var or /usr before, just to find out those snapshots didn't include stuff he messed around with. Greetings, Philipp From owner-freebsd-fs@FreeBSD.ORG Mon Aug 16 11:06:59 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C8DF1065694 for ; Mon, 16 Aug 2010 11:06:59 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 4E53B8FC25 for ; Mon, 16 Aug 2010 11:06:59 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o7GB6x5s058870 for ; Mon, 16 Aug 2010 11:06:59 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o7GB6wVr058868 for freebsd-fs@FreeBSD.org; Mon, 16 Aug 2010 11:06:58 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 16 Aug 2010 11:06:58 GMT Message-Id: <201008161106.o7GB6wVr058868@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Aug 2010 11:06:59 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/149495 fs [zfs] chflags sappend on zfs not working right o kern/149297 fs [zfs] zpool iostat fails to show 666GB o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149022 fs [hang] File system operations hangs with suspfs state o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/147292 fs [nfs] [patch] readahead missing in nfs client options o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server o kern/146375 fs [nfs] [patch] Typos in macro variables names in sys/fs s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o kern/145309 fs [disklabel]: Editing disk label invalidates the whole o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c o kern/144458 fs [nfs] [patch] nfsd fails as a kld p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o kern/143345 fs [ext2fs] [patch] extfs minor header cleanups to better o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142924 fs [ext2fs] [patch] Small cleanup for the inode struct in o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142401 fs [ntfs] [patch] Minor updates to NTFS from NetBSD o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140134 fs [msdosfs] write and fsck destroy filesystem integrity o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs o bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/139363 fs [nfs] diskless root nfs mount from non FreeBSD server o kern/138790 fs [zfs] ZFS ceases caching when mem demand is high o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb f kern/137037 fs [zfs] [hang] zfs rollback on root causes FreeBSD to fr o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic o kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [panic] panic: ffs_truncate: read-only filesystem o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int f kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS p kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [fs] [snapshot] System crashes when manipulati o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/115645 fs [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] mount_msdosfs: msdosfs_iconv: Operation not o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 188 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Aug 16 12:10:05 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 979BF106564A for ; Mon, 16 Aug 2010 12:10:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 86B868FC13 for ; Mon, 16 Aug 2010 12:10:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o7GCA5xf022399 for ; Mon, 16 Aug 2010 12:10:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o7GCA5AK022398; Mon, 16 Aug 2010 12:10:05 GMT (envelope-from gnats) Date: Mon, 16 Aug 2010 12:10:05 GMT Message-Id: <201008161210.o7GCA5AK022398@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: "Anton A. Barsukov" Cc: Subject: Re: kern/149297: [zfs] zpool iostat fails to show 666GB X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: "Anton A. Barsukov" List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Aug 2010 12:10:05 -0000 The following reply was made to PR kern/149297; it has been noted by GNATS. From: "Anton A. Barsukov" To: bug-followup@FreeBSD.org, freebsd@blackring.net Cc: Subject: Re: kern/149297: [zfs] zpool iostat fails to show 666GB Date: Mon, 16 Aug 2010 18:04:27 +0600 [root@backup /root]# uname -srm FreeBSD 8.1-STABLE amd64 [root@backup /root]# zpool iostat capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- zvault 1.72T 1.87T 6 7 365K 349K [root@backup /root]# zpool status pool: zvault state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zvault ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gptid/aa9f7dbb-1712-11df-b7e3-003048d1fffe ONLINE 0 0 0 gptid/ab73dd83-1712-11df-b7e3-003048d1fffe ONLINE 0 0 0 gptid/ac49499a-1712-11df-b7e3-003048d1fffe ONLINE 0 0 0 gptid/ad1ba31b-1712-11df-b7e3-003048d1fffe ONLINE 0 0 0 errors: No known data errors I think you need zpool scrub. -- WBR Anton Barsukov | iscsi@zesdid.com | PGP ID: DB73CCC8 From owner-freebsd-fs@FreeBSD.ORG Mon Aug 16 13:57:20 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 46CD61065694; Mon, 16 Aug 2010 13:57:20 +0000 (UTC) (envelope-from ae@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C7B9B8FC12; Mon, 16 Aug 2010 13:57:19 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o7GDvJVF030271; Mon, 16 Aug 2010 13:57:19 GMT (envelope-from ae@freefall.freebsd.org) Received: (from ae@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o7GDvJLQ030267; Mon, 16 Aug 2010 13:57:19 GMT (envelope-from ae) Date: Mon, 16 Aug 2010 13:57:19 GMT Message-Id: <201008161357.o7GDvJLQ030267@freefall.freebsd.org> To: freebsd@blackring.net, ae@FreeBSD.org, freebsd-fs@FreeBSD.org From: ae@FreeBSD.org Cc: Subject: Re: kern/149297: [zfs] zpool iostat fails to show 666GB X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Aug 2010 13:57:20 -0000 Synopsis: [zfs] zpool iostat fails to show 666GB State-Changed-From-To: open->closed State-Changed-By: ae State-Changed-When: Mon Aug 16 13:53:21 UTC 2010 State-Changed-Why: zpool(8) uses zfs_nicenum function from libzfs_util.c for formatting numbers. And there are none of special cases for number 666. http://www.freebsd.org/cgi/query-pr.cgi?pr=149297 From owner-freebsd-fs@FreeBSD.ORG Mon Aug 16 18:53:46 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 899BA1065696; Mon, 16 Aug 2010 18:53:46 +0000 (UTC) (envelope-from arundel@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6034D8FC0A; Mon, 16 Aug 2010 18:53:46 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o7GIrk8B023595; Mon, 16 Aug 2010 18:53:46 GMT (envelope-from arundel@freefall.freebsd.org) Received: (from arundel@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o7GIrjWp023591; Mon, 16 Aug 2010 18:53:45 GMT (envelope-from arundel) Date: Mon, 16 Aug 2010 18:53:45 GMT Message-Id: <201008161853.o7GIrjWp023591@freefall.freebsd.org> To: 4ertus2@mail.ru, arundel@FreeBSD.org, freebsd-fs@FreeBSD.org From: arundel@FreeBSD.org Cc: Subject: Re: kern/130229: [iconv] usermount fails on fs that need iconv X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Aug 2010 18:53:46 -0000 Synopsis: [iconv] usermount fails on fs that need iconv State-Changed-From-To: open->closed State-Changed-By: arundel State-Changed-When: Mon Aug 16 18:51:02 UTC 2010 State-Changed-Why: Duplicate of kern/109024 and bin/93857. http://www.freebsd.org/cgi/query-pr.cgi?pr=130229 From owner-freebsd-fs@FreeBSD.ORG Tue Aug 17 22:10:08 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 553631065693; Tue, 17 Aug 2010 22:10:08 +0000 (UTC) (envelope-from thomas@gibfest.dk) Received: from mail.tyknet.dk (mail.tyknet.dk [213.150.42.155]) by mx1.freebsd.org (Postfix) with ESMTP id D57BB8FC19; Tue, 17 Aug 2010 22:10:07 +0000 (UTC) Received: from [10.32.67.66] (fw.int.webpartner.dk [213.150.34.98]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.tyknet.dk (Postfix) with ESMTPSA id A6F4C63A821; Wed, 18 Aug 2010 00:10:05 +0200 (CEST) X-DKIM: OpenDKIM Filter v2.1.3 mail.tyknet.dk A6F4C63A821 Authentication-Results: mail.tyknet.dk; dkim=none (no signature); dkim-adsp=none Message-ID: <4C6B08BD.9080206@gibfest.dk> Date: Wed, 18 Aug 2010 00:10:05 +0200 From: Thomas Steen Rasmussen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100711 Lightning/1.0b1 Thunderbird/3.0.6 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4C57E20E.2030908@gibfest.dk> <20100806135001.GF1710@garage.freebsd.pl> <4C5ECA78.6010803@gibfest.dk> <20100810075528.GA1754@garage.freebsd.pl> <4C61CF4D.4060009@gibfest.dk> <4C651B7E.5000805@gibfest.dk> <20100816221059.GE2611@garage.freebsd.pl> In-Reply-To: <20100816221059.GE2611@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: HAST initial sync speed X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Aug 2010 22:10:08 -0000 On 17-08-2010 00:10, Pawel Jakub Dawidek wrote: > On Fri, Aug 13, 2010 at 12:16:30PM +0200, Thomas Steen Rasmussen wrote: > >> Just a quick update, it is still working on syncing the one HAST >> resource configured >> the other day: >> >> [root@server1 ~]# date&& hastctl status >> Thu Aug 12 14:11:18 CEST 2010 >> hasthd4: >> role: primary >> provname: hasthd4 >> localpath: /dev/label/hd4 >> extentsize: 2097152 >> keepdirty: 64 >> remoteaddr: 192.168.0.15 >> replication: memsync >> status: complete >> dirty: 102651396096 bytes >> [root@server1 ~]# date&& hastctl status >> Fri Aug 13 09:48:06 CEST 2010 >> hasthd4: >> role: primary >> provname: hasthd4 >> localpath: /dev/label/hd4 >> extentsize: 2097152 >> keepdirty: 64 >> remoteaddr: 192.168.0.15 >> replication: memsync >> status: complete >> dirty: 80425779200 bytes >> >> Just under 20 gigabytes in just under 20 hours. >> >> Any suggestions are appreciated, >> > I'm sorry for the delay, I needed some time to prepare test environment. > Currently I'm running synchronization between two HAST nodes connected > with 1Gb link and it took 4 minutes 5 seconds to synchronize 16GB of > data, so the speed was around 68MB/s. > > I was doing the test on memory-backed md(4) devices to exclude disks > speed. Could you do similar test? You need to create md(4) devices this > way on both nodes: > > # mdconfig -a -t malloc -s 16g -o compress > > The 'compress' option will make md(4) devices to consume no space when > writting just zeros. > > My hast.conf looks like this: > > resource test { > local /dev/md0 > > on nodea { > remote tcp4://10.0.0.1 > } > on nodeb { > remote tcp4://10.0.0.2 > } > } > > This will help us to tell how fast is your network. You can observe > speed with gstat(8). > > Hello, I performed the tests with md devices like you asked. It seems like using memory disks doesn't make any difference. It is still running at 2-300kBps according to gstat. The network is plenty fast, like I mentioned in an earlier mail, once the initial sync is done, I can reach almost wire speed, over 100 megabytes per second. Just for reference, an iperf test: # iperf -c 192.168.0.15 ------------------------------------------------------------ Client connecting to 192.168.0.15, TCP port 5001 TCP window size: 32.5 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.0.14 port 64049 connected with 192.168.0.15 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.08 GBytes 929 Mbits/sec Even scp is fast: # scp FreeBSD-8.1-RELEASE-amd64-disc1.iso 192.168.0.15:/data/ FreeBSD-8.1-RELEASE-amd64-disc1.iso 100% 682MB 85.2MB/s 00:08 But HAST is humming along: # while true; do date && hastctl status && sleep 60; done Wed Aug 18 00:04:45 CEST 2010 mdtest: role: primary provname: mdtest localpath: /dev/md0 extentsize: 2097152 keepdirty: 64 remoteaddr: 192.168.0.15 replication: memsync status: complete dirty: 16909336576 bytes Wed Aug 18 00:05:45 CEST 2010 mdtest: role: primary provname: mdtest localpath: /dev/md0 extentsize: 2097152 keepdirty: 64 remoteaddr: 192.168.0.15 replication: memsync status: complete dirty: 16890462208 bytes Wed Aug 18 00:06:45 CEST 2010 mdtest: role: primary provname: mdtest localpath: /dev/md0 extentsize: 2097152 keepdirty: 64 remoteaddr: 192.168.0.15 replication: memsync status: complete dirty: 16869490688 bytes Wed Aug 18 00:07:45 CEST 2010 mdtest: role: primary provname: mdtest localpath: /dev/md0 extentsize: 2097152 keepdirty: 64 remoteaddr: 192.168.0.15 replication: memsync status: complete dirty: 16850616320 bytes ^C # Thanks, Thomas Steen Rasmussen From owner-freebsd-fs@FreeBSD.ORG Wed Aug 18 09:10:54 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 853BE1065695; Wed, 18 Aug 2010 09:10:54 +0000 (UTC) (envelope-from daichi@freebsd.org) Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90]) by mx1.freebsd.org (Postfix) with ESMTP id 55C188FC23; Wed, 18 Aug 2010 09:10:54 +0000 (UTC) Received: from parancell.ongs.co.jp (dullmdaler.ongs.co.jp [202.216.246.94]) by natial.ongs.co.jp (Postfix) with ESMTPSA id 12B6E12543B; Wed, 18 Aug 2010 17:52:34 +0900 (JST) Message-ID: <4C6B9F51.1060009@freebsd.org> Date: Wed, 18 Aug 2010 17:52:33 +0900 From: Daichi GOTO User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.11) Gecko/20100724 Thunderbird/3.0.6 MIME-Version: 1.0 To: ed@80386.nl, freebsd-current@freebsd.org, freebsd-fs@freebsd.org, ozawa@ongs.co.jp Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: unionfs a little improvement X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2010 09:10:54 -0000 Hi Ed and unionfs fan gyus. Ed pointed out a contradict behavior between current unionfs implementation and its manual, and sent me a patch. Thanks Ed ;) ---- Index: sys/fs/unionfs/union_vfsops.c =================================================================== --- sys/fs/unionfs/union_vfsops.c (revision 211093) +++ sys/fs/unionfs/union_vfsops.c (working copy) @@ -89,7 +89,6 @@ u_short ufile; unionfs_copymode copymode; unionfs_whitemode whitemode; - struct componentname fakecn; struct nameidata nd, *ndp; struct vattr va; @@ -280,26 +279,6 @@ mp->mnt_flag |= ump->um_uppervp->v_mount->mnt_flag & MNT_RDONLY; /* - * Check whiteout - */ - if ((mp->mnt_flag & MNT_RDONLY) == 0) { - memset(&fakecn, 0, sizeof(fakecn)); - fakecn.cn_nameiop = LOOKUP; - fakecn.cn_thread = td; - error = VOP_WHITEOUT(ump->um_uppervp, &fakecn, LOOKUP); - if (error) { - if (below) { - VOP_UNLOCK(ump->um_uppervp, 0); - vrele(upperrootvp); - } else - vput(ump->um_uppervp); - free(ump, M_UNIONFSMNT); - mp->mnt_data = NULL; - return (error); - } - } - - /* * Unlock the node */ VOP_UNLOCK(ump->um_uppervp, 0); ---- Ed's message here: ---- Just for fun I was hacking on a writable bootcd, using unionfs and tmpfs. I noticed tmpfs doesn't support whiteouts (yet). This prevents unionfs from mounting tmpfs on top. I do want to be able to use tmpfs, even if it means we can't get any whiteouts. The manpage says the following: Without whiteout support from the file system backing the upper layer, there is no way that delete and rename operations on lower layer objects can be done. EROFS is returned for this kind of operations along with any others which would make modifications to the lower layer, such as chmod(1). This seems to contradict the current behaviour, which is to deny the mount altogether. The attached patch makes it work, but instead of EROFS, it now returns EOPNOTSUPP, as generated by VOP_WHITEOUT(). ---- It looks like reasonable and patch is simple and effective I guess. If you unionfs guys or fs guys have some ideas around this patch, please teach me. After some tests and a couple of weeks after, I'll commit ed's patch if there is no objections. -- Daichi GOTO 81-42-316-7945 | daichi@ongs.co.jp | http://www.ongs.co.jp LinkedIn: http://linkedin.com/in/daichigoto From owner-freebsd-fs@FreeBSD.ORG Wed Aug 18 10:48:54 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 48C991065670; Wed, 18 Aug 2010 10:48:54 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from mx0.hoeg.nl (unknown [IPv6:2a01:4f8:101:5343::aa]) by mx1.freebsd.org (Postfix) with ESMTP id DA8F08FC13; Wed, 18 Aug 2010 10:48:53 +0000 (UTC) Received: by mx0.hoeg.nl (Postfix, from userid 1000) id 0EA682A28CDB; Wed, 18 Aug 2010 12:48:53 +0200 (CEST) Date: Wed, 18 Aug 2010 12:48:53 +0200 From: Ed Schouten To: Daichi GOTO Message-ID: <20100818104853.GB2978@hoeg.nl> References: <4C6B9F51.1060009@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ZQ93ZA/51YEIDw4/" Content-Disposition: inline In-Reply-To: <4C6B9F51.1060009@freebsd.org> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Mounting cd9660 multiple times gives EBUSY [Was: unionfs a little improvement] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2010 10:48:54 -0000 --ZQ93ZA/51YEIDw4/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Daichi, I think Keith Packard of Xorg once wrote a commit message along the lines of "5000 lines of code removed, feature added" This seems to be similar, albeit on a smaller scale. ;-) Apart from this issue with unionfs, I am also experiencing another issue, where for some reason I cannot perform a second mount of the CD right after booting the system. Basically, my WIP FreeBSD boot CD does the following (but written in C): mount -t cd9660 /dev/iso9660/freebsd /mnt mount -t tmpfs none /tmp mount -t unionfs /tmp /mnt mount -t devfs none /mnt/dev chroot /mnt /sbin/init The first step fails with EBUSY. I use the following hack to get it working, but I don't think it's the proper way to solve it: %%% Index: sys/geom/geom_vfs.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/geom/geom_vfs.c (revision 211093) +++ sys/geom/geom_vfs.c (working copy) @@ -162,8 +162,10 @@ =20 *cpp =3D NULL; bo =3D &vp->v_bufobj; +#if 0 if (bo->bo_private !=3D vp) return (EBUSY); +#endif =20 pp =3D g_dev_getprovider(vp->v_rdev); if (pp =3D=3D NULL) %%% I am really not that familiar with GEOM/VFS to understand the impact of this change. What does it actually mean if bo->bo_private !=3D vp? --=20 Ed Schouten WWW: http://80386.nl/ --ZQ93ZA/51YEIDw4/ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD) iEYEARECAAYFAkxrupUACgkQ52SDGA2eCwVOWgCfRlwSVs8Fs3iGD2ekr/cwQmMB Y18An1UvOsrA+Xsys2nbvClJh8SoayaR =nCoj -----END PGP SIGNATURE----- --ZQ93ZA/51YEIDw4/-- From owner-freebsd-fs@FreeBSD.ORG Wed Aug 18 11:15:08 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B26A1065696; Wed, 18 Aug 2010 11:15:08 +0000 (UTC) (envelope-from thomas@gibfest.dk) Received: from mail.tyknet.dk (mail.tyknet.dk [213.150.42.155]) by mx1.freebsd.org (Postfix) with ESMTP id B5CCB8FC18; Wed, 18 Aug 2010 11:15:07 +0000 (UTC) Received: from [10.32.67.66] (fw.int.webpartner.dk [213.150.34.98]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.tyknet.dk (Postfix) with ESMTPSA id A649363A821; Wed, 18 Aug 2010 13:15:06 +0200 (CEST) X-DKIM: OpenDKIM Filter v2.1.3 mail.tyknet.dk A649363A821 Authentication-Results: mail.tyknet.dk; dkim=none (no signature); dkim-adsp=none Message-ID: <4C6BC0BA.9030303@gibfest.dk> Date: Wed, 18 Aug 2010 13:15:06 +0200 From: Thomas Steen Rasmussen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100711 Lightning/1.0b1 Thunderbird/3.0.6 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4C57E20E.2030908@gibfest.dk> <20100806135001.GF1710@garage.freebsd.pl> <4C5ECA78.6010803@gibfest.dk> <20100810075528.GA1754@garage.freebsd.pl> <4C61CF4D.4060009@gibfest.dk> <4C651B7E.5000805@gibfest.dk> <20100816221059.GE2611@garage.freebsd.pl> <4C6B08BD.9080206@gibfest.dk> <20100818110655.GA2177@garage.freebsd.pl> In-Reply-To: <20100818110655.GA2177@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: HAST initial sync speed X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2010 11:15:08 -0000 On 18-08-2010 13:06, Pawel Jakub Dawidek wrote: > On Wed, Aug 18, 2010 at 12:10:05AM +0200, Thomas Steen Rasmussen wrote: > >> I performed the tests with md devices like you asked. It seems like using >> memory disks doesn't make any difference. It is still running at 2-300kBps >> according to gstat. The network is plenty fast, like I mentioned in an >> earlier mail, once the initial sync is done, I can reach almost wire speed, >> over 100 megabytes per second. >> > Could you edit sbin/hastd/proto_common.c and change MAX_SEND_SIZE at the > begining of the file from 131072 to 32768? > > Then do the following: > > # cd /usr/src > # make buildenv > # cd sbin/hastd > # make&& make install > ^D > > And please rerun the test with md(4) devices. > > Woop! That certaintly did something, it is currently synching the md device at almost 80 megabytes/sec according to gstat! Regards, Thomas Steen Rasmussen From owner-freebsd-fs@FreeBSD.ORG Wed Aug 18 11:26:20 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E3181065670; Wed, 18 Aug 2010 11:26:20 +0000 (UTC) (envelope-from thomas@gibfest.dk) Received: from mail.tyknet.dk (mail.tyknet.dk [213.150.42.155]) by mx1.freebsd.org (Postfix) with ESMTP id 3A0D28FC13; Wed, 18 Aug 2010 11:26:20 +0000 (UTC) Received: from [10.32.67.66] (fw.int.webpartner.dk [213.150.34.98]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.tyknet.dk (Postfix) with ESMTPSA id 7420863A827; Wed, 18 Aug 2010 13:26:19 +0200 (CEST) X-DKIM: OpenDKIM Filter v2.1.3 mail.tyknet.dk 7420863A827 Authentication-Results: mail.tyknet.dk; dkim=none (no signature); dkim-adsp=none Message-ID: <4C6BC35B.9040000@gibfest.dk> Date: Wed, 18 Aug 2010 13:26:19 +0200 From: Thomas Steen Rasmussen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100711 Lightning/1.0b1 Thunderbird/3.0.6 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4C57E20E.2030908@gibfest.dk> <20100806135001.GF1710@garage.freebsd.pl> <4C5ECA78.6010803@gibfest.dk> <20100810075528.GA1754@garage.freebsd.pl> <4C61CF4D.4060009@gibfest.dk> <4C651B7E.5000805@gibfest.dk> <20100816221059.GE2611@garage.freebsd.pl> <4C6B08BD.9080206@gibfest.dk> <20100818110655.GA2177@garage.freebsd.pl> <4C6BC0BA.9030303@gibfest.dk> In-Reply-To: <4C6BC0BA.9030303@gibfest.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: HAST initial sync speed X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2010 11:26:20 -0000 On 18-08-2010 13:15, Thomas Steen Rasmussen wrote: > On 18-08-2010 13:06, Pawel Jakub Dawidek wrote: >> On Wed, Aug 18, 2010 at 12:10:05AM +0200, Thomas Steen Rasmussen wrote: >>> I performed the tests with md devices like you asked. It seems like >>> using >>> memory disks doesn't make any difference. It is still running at >>> 2-300kBps >>> according to gstat. The network is plenty fast, like I mentioned in an >>> earlier mail, once the initial sync is done, I can reach almost wire >>> speed, >>> over 100 megabytes per second. >> Could you edit sbin/hastd/proto_common.c and change MAX_SEND_SIZE at the >> begining of the file from 131072 to 32768? >> >> Then do the following: >> >> # cd /usr/src >> # make buildenv >> # cd sbin/hastd >> # make&& make install >> ^D >> >> And please rerun the test with md(4) devices. >> > Woop! > > That certaintly did something, it is currently synching the md > device at almost 80 megabytes/sec according to gstat! > Hello again, Sorry for responding to my own post. I recreated the HAST setup with the four harddrives and it is now saturating the gigabit link, it is synching at a steady rate of 112 megabytes per second, meaning that each of the disks are reading/writing at just under 30 megabytes per second. It would probably be even faster if the network wasn't limiting it. I built ZFS on top of it and everything seems to be working as expected. Thank you again for looking at this. Thomas Steen Rasmussen From owner-freebsd-fs@FreeBSD.ORG Wed Aug 18 12:42:10 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0EE31065693; Wed, 18 Aug 2010 12:42:10 +0000 (UTC) (envelope-from thomas@gibfest.dk) Received: from mail.tyknet.dk (mail.tyknet.dk [213.150.42.155]) by mx1.freebsd.org (Postfix) with ESMTP id 7C4988FC1A; Wed, 18 Aug 2010 12:42:10 +0000 (UTC) Received: from [10.32.67.66] (fw.int.webpartner.dk [213.150.34.98]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.tyknet.dk (Postfix) with ESMTPSA id BA47863A821; Wed, 18 Aug 2010 14:42:09 +0200 (CEST) X-DKIM: OpenDKIM Filter v2.1.3 mail.tyknet.dk BA47863A821 Authentication-Results: mail.tyknet.dk; dkim=none (no signature); dkim-adsp=none Message-ID: <4C6BD521.1060807@gibfest.dk> Date: Wed, 18 Aug 2010 14:42:09 +0200 From: Thomas Steen Rasmussen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100711 Lightning/1.0b1 Thunderbird/3.0.6 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <20100806135001.GF1710@garage.freebsd.pl> <4C5ECA78.6010803@gibfest.dk> <20100810075528.GA1754@garage.freebsd.pl> <4C61CF4D.4060009@gibfest.dk> <4C651B7E.5000805@gibfest.dk> <20100816221059.GE2611@garage.freebsd.pl> <4C6B08BD.9080206@gibfest.dk> <20100818110655.GA2177@garage.freebsd.pl> <4C6BC0BA.9030303@gibfest.dk> <4C6BC35B.9040000@gibfest.dk> <20100818121133.GC2177@garage.freebsd.pl> In-Reply-To: <20100818121133.GC2177@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: HAST initial sync speed X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2010 12:42:10 -0000 On 18-08-2010 14:11, Pawel Jakub Dawidek wrote: > On Wed, Aug 18, 2010 at 01:26:19PM +0200, Thomas Steen Rasmussen wrote: > >> Sorry for responding to my own post. I recreated the HAST setup with the >> four harddrives and it is now saturating the gigabit link, it is >> synching at a steady >> rate of 112 megabytes per second, meaning that each of the disks are >> reading/writing at just under 30 megabytes per second. It would probably be >> even faster if the network wasn't limiting it. >> >> I built ZFS on top of it and everything seems to be working as expected. >> >> Thank you again for looking at this. >> > I'm very glad to hear that. I changed the default MAX_SEND_SIZE to 32kB > as it seems to be the safest setting. I had similar problem in ggate. > > OK. So it will be (or has been) committed to the tree ? What about MFC ? I should probably know this, but: What does this mean, like, when will this fix be in -STABLE or even in a -RELEASE ? > BTW. What network cards do you use there? > > The servers has four of these: bce0: Thomas Steen Rasmussen From owner-freebsd-fs@FreeBSD.ORG Wed Aug 18 13:10:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63DA31065697; Wed, 18 Aug 2010 13:10:19 +0000 (UTC) (envelope-from thomas@gibfest.dk) Received: from mail.tyknet.dk (mail.tyknet.dk [213.150.42.155]) by mx1.freebsd.org (Postfix) with ESMTP id 1EF068FC16; Wed, 18 Aug 2010 13:10:18 +0000 (UTC) Received: from [10.32.67.66] (fw.int.webpartner.dk [213.150.34.98]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.tyknet.dk (Postfix) with ESMTPSA id 25E3663A827; Wed, 18 Aug 2010 15:10:18 +0200 (CEST) X-DKIM: OpenDKIM Filter v2.1.3 mail.tyknet.dk 25E3663A827 Authentication-Results: mail.tyknet.dk; dkim=none (no signature); dkim-adsp=none Message-ID: <4C6BDBB9.3020007@gibfest.dk> Date: Wed, 18 Aug 2010 15:10:17 +0200 From: Thomas Steen Rasmussen User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100711 Lightning/1.0b1 Thunderbird/3.0.6 MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <20100810075528.GA1754@garage.freebsd.pl> <4C61CF4D.4060009@gibfest.dk> <4C651B7E.5000805@gibfest.dk> <20100816221059.GE2611@garage.freebsd.pl> <4C6B08BD.9080206@gibfest.dk> <20100818110655.GA2177@garage.freebsd.pl> <4C6BC0BA.9030303@gibfest.dk> <4C6BC35B.9040000@gibfest.dk> <20100818121133.GC2177@garage.freebsd.pl> <4C6BD521.1060807@gibfest.dk> <20100818125856.GE2177@garage.freebsd.pl> In-Reply-To: <20100818125856.GE2177@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: HAST initial sync speed X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2010 13:10:19 -0000 On 18-08-2010 14:58, Pawel Jakub Dawidek wrote: > On Wed, Aug 18, 2010 at 02:42:09PM +0200, Thomas Steen Rasmussen wrote: > >> On 18-08-2010 14:11, Pawel Jakub Dawidek wrote: >> >>> I'm very glad to hear that. I changed the default MAX_SEND_SIZE to 32kB >>> as it seems to be the safest setting. I had similar problem in ggate. >>> >>> >>> >> OK. So it will be (or has been) committed to the tree ? What about MFC ? >> I should >> probably know this, but: >> What does this mean, like, when will this fix be in -STABLE or even in a >> -RELEASE ? >> > I just committed it to FreeBSD-CURRENT, it will be merged to > FreeBSD-STABLE (8-STABLE) in three weeks and will be available in > 8.2-RELEASE when it is released. > > Awesome. If you ever come to Denmark, I'm buying dinner and beer! Thank you for all your work, Thomas Steen Rasmussen From owner-freebsd-fs@FreeBSD.ORG Wed Aug 18 15:20:14 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EFE9D106566B; Wed, 18 Aug 2010 15:20:14 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from mx0.hoeg.nl (unknown [IPv6:2a01:4f8:101:5343::aa]) by mx1.freebsd.org (Postfix) with ESMTP id 8BC138FC13; Wed, 18 Aug 2010 15:20:14 +0000 (UTC) Received: by mx0.hoeg.nl (Postfix, from userid 1000) id BFC432A28CDB; Wed, 18 Aug 2010 17:20:13 +0200 (CEST) Date: Wed, 18 Aug 2010 17:20:13 +0200 From: Ed Schouten To: Pawel Jakub Dawidek Message-ID: <20100818152013.GF2978@hoeg.nl> References: <4C6B9F51.1060009@freebsd.org> <20100818104853.GB2978@hoeg.nl> <20100818121550.GD2177@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="zBZpvCNcoXwafAjP" Content-Disposition: inline In-Reply-To: <20100818121550.GD2177@garage.freebsd.pl> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org, Daichi GOTO , freebsd-current@freebsd.org Subject: Re: Mounting cd9660 multiple times gives EBUSY [Was: unionfs a little improvement] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2010 15:20:15 -0000 --zBZpvCNcoXwafAjP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable * Pawel Jakub Dawidek wrote: > What you are trying to do here is to mount /dev/iso9660/freebsd for the > second time? This is not supported. The check is there to prevent doing > this, as it will panic on you when you try to unmount first mount (not > really a problem in your case, as the first mount is /, so you probably > don't want to unmount it, but it is a problem in general). >=20 > You should be able to reproduce the panic with your patch applied by > doing the following: >=20 > # mount -t cd9660 /dev/iso9660/freebsd /mnt0 > # mount -t cd9660 /dev/iso9660/freebsd /mnt1 > # umount /mnt0 Ah, I see. Well, I changed my setup to use an md root now. Works quite nicely. Screenshot: http://80386.nl/pub/livecd.png Thanks for the explanation! --=20 Ed Schouten WWW: http://80386.nl/ --zBZpvCNcoXwafAjP Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD) iEYEARECAAYFAkxr+i0ACgkQ52SDGA2eCwWKvwCcCnXjomyWP3sqeYnzMe5M8zHH 3SIAnRxazLWYyzLb6uSgYsDNFyR1OS4Q =6ev4 -----END PGP SIGNATURE----- --zBZpvCNcoXwafAjP-- From owner-freebsd-fs@FreeBSD.ORG Wed Aug 18 18:44:49 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A325910656A8 for ; Wed, 18 Aug 2010 18:44:49 +0000 (UTC) (envelope-from mwlucas@bewilderbeast.blackhelicopters.org) Received: from bewilderbeast.blackhelicopters.org (bewilderbeast.blackhelicopters.org [198.22.63.8]) by mx1.freebsd.org (Postfix) with ESMTP id 6330C8FC1B for ; Wed, 18 Aug 2010 18:44:48 +0000 (UTC) Received: from bewilderbeast.blackhelicopters.org (localhost [127.0.0.1]) by bewilderbeast.blackhelicopters.org (8.14.4/8.14.4) with ESMTP id o7IIX0Q9098991 for ; Wed, 18 Aug 2010 14:33:01 -0400 (EDT) (envelope-from mwlucas@bewilderbeast.blackhelicopters.org) Received: (from mwlucas@localhost) by bewilderbeast.blackhelicopters.org (8.14.4/8.14.4/Submit) id o7IIX08e098989 for fs@freebsd.org; Wed, 18 Aug 2010 14:33:00 -0400 (EDT) (envelope-from mwlucas) Date: Wed, 18 Aug 2010 14:33:00 -0400 From: "Michael W. Lucas" To: fs@freebsd.org Message-ID: <20100818183300.GA98970@bewilderbeast.blackhelicopters.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.5 (bewilderbeast.blackhelicopters.org [127.0.0.1]); Wed, 18 Aug 2010 14:33:01 -0400 (EDT) Cc: Subject: HAST secondary identifying when it's dirty? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2010 18:44:49 -0000 Hi, I have a HAST device set up between two systems. It appears that the secondary doesn't know when it's dirty. Is there any way for the secondary to know that its copy is incomplete? For example, I had to resync my HAST partition due to a firewall issue. While the master knows the secondary is dirty, the secondary thinks it's complete. For example: master# hastctl status mirror: role: primary provname: mirror localpath: /dev/da0s2 extentsize: 2097152 keepdirty: 64 remoteaddr: 192.168.0.1 replication: memsync status: complete dirty: 24385683456 bytes secondary# hastctl status mirror: role: secondary provname: mirror localpath: /dev/da0s2 extentsize: 2097152 keepdirty: 0 remoteaddr: 192.168.0.2 replication: memsync status: complete dirty: 0 bytes The secondary doesn't realize it's 24GB out of sync. Is there any way for the secondary to get this information? Thanks, ==ml -- Michael W. Lucas mwlucas@BlackHelicopters.org http://www.MichaelWLucas.com/, http://blather.MichaelWLucas.com/ New book available: Network Flow Analysis http://www.networkflowanalysis.com/ From owner-freebsd-fs@FreeBSD.ORG Thu Aug 19 00:40:54 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 48A6810656AA; Thu, 19 Aug 2010 00:40:54 +0000 (UTC) (envelope-from daichi@ongs.co.jp) Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90]) by mx1.freebsd.org (Postfix) with ESMTP id 1BD6D8FC0A; Thu, 19 Aug 2010 00:40:53 +0000 (UTC) Received: from parancell.ongs.co.jp (dullmdaler.ongs.co.jp [202.216.246.94]) by natial.ongs.co.jp (Postfix) with ESMTPSA id 2015A12543B; Thu, 19 Aug 2010 09:23:35 +0900 (JST) Message-ID: <4C6C797C.5000409@ongs.co.jp> Date: Thu, 19 Aug 2010 09:23:24 +0900 From: Daichi GOTO User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.11) Gecko/20100724 Thunderbird/3.0.6 MIME-Version: 1.0 To: ed@80386.nl, freebsd-current@freebsd.org, freebsd-fs@freebsd.org, ozawa@ongs.co.jp Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: unionfs a little improvement X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Aug 2010 00:40:54 -0000 Hi Ed and unionfs fan gyus. Ed pointed out a contradict behavior between current unionfs implementation and its manual, and sent me a patch. Thanks Ed ;) ---- Index: sys/fs/unionfs/union_vfsops.c =================================================================== --- sys/fs/unionfs/union_vfsops.c (revision 211093) +++ sys/fs/unionfs/union_vfsops.c (working copy) @@ -89,7 +89,6 @@ u_short ufile; unionfs_copymode copymode; unionfs_whitemode whitemode; - struct componentname fakecn; struct nameidata nd, *ndp; struct vattr va; @@ -280,26 +279,6 @@ mp->mnt_flag |= ump->um_uppervp->v_mount->mnt_flag & MNT_RDONLY; /* - * Check whiteout - */ - if ((mp->mnt_flag & MNT_RDONLY) == 0) { - memset(&fakecn, 0, sizeof(fakecn)); - fakecn.cn_nameiop = LOOKUP; - fakecn.cn_thread = td; - error = VOP_WHITEOUT(ump->um_uppervp, &fakecn, LOOKUP); - if (error) { - if (below) { - VOP_UNLOCK(ump->um_uppervp, 0); - vrele(upperrootvp); - } else - vput(ump->um_uppervp); - free(ump, M_UNIONFSMNT); - mp->mnt_data = NULL; - return (error); - } - } - - /* * Unlock the node */ VOP_UNLOCK(ump->um_uppervp, 0); ---- Ed's message here: ---- Just for fun I was hacking on a writable bootcd, using unionfs and tmpfs. I noticed tmpfs doesn't support whiteouts (yet). This prevents unionfs from mounting tmpfs on top. I do want to be able to use tmpfs, even if it means we can't get any whiteouts. The manpage says the following: Without whiteout support from the file system backing the upper layer, there is no way that delete and rename operations on lower layer objects can be done. EROFS is returned for this kind of operations along with any others which would make modifications to the lower layer, such as chmod(1). This seems to contradict the current behaviour, which is to deny the mount altogether. The attached patch makes it work, but instead of EROFS, it now returns EOPNOTSUPP, as generated by VOP_WHITEOUT(). ---- It looks like reasonable and patch is simple and effective I guess. If you unionfs guys or fs guys have some ideas around this patch, please teach me. After some tests and a couple of weeks after, I'll commit ed's patch if there is no objections. -- Daichi GOTO CEO | ONGS Inc. 81-42-316-7945 | daichi@ongs.co.jp | http://www.ongs.co.jp LinkedIn: http://linkedin.com/in/daichigoto From owner-freebsd-fs@FreeBSD.ORG Thu Aug 19 03:03:01 2010 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 62B2A1065695 for ; Thu, 19 Aug 2010 03:03:01 +0000 (UTC) (envelope-from gnehzuil@gmail.com) Received: from mail-px0-f182.google.com (mail-px0-f182.google.com [209.85.212.182]) by mx1.freebsd.org (Postfix) with ESMTP id 33BE68FC14 for ; Thu, 19 Aug 2010 03:03:01 +0000 (UTC) Received: by pxi17 with SMTP id 17so619613pxi.13 for ; Wed, 18 Aug 2010 20:03:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:cc:subject:content-type; bh=ODtJu98Yx8pyMX2xK/Xg1mIntcHCEbUwRtDnSWc7ESk=; b=E1A8QvuRl7xGgRNUxuCbE6iH29B4oSyc1UNBvYBXm1BMC4SxdNzq29Ba5SUdo0SxPq 0qWW7uWU/0RSPywwKzROnuhag5Q4fQL/cQEU/yWamgTYsbgKr0T8wxX0jt8EvgHlUZnK Ds6qU/UtLRoFJ655YJxkG7fZbsODHXos3nsZo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :content-type; b=sksIGfqqNRGb4tW19vxNVmonQMZu6rLBo57SjrAqX2nEBckIxHDHfY1ciVSa3B62sn InmBuLuVaruWZVeQf8FPdRwOuXPtnWZ93PDczM9T05cQiPfwk3JDNAOKJ9Gzfktl9hcv dSq4uynFmVZnscrUwN+4VV61K71PbW9pmfoCs= Received: by 10.142.218.9 with SMTP id q9mr7879773wfg.64.1282185471766; Wed, 18 Aug 2010 19:37:51 -0700 (PDT) Received: from [192.168.1.250] ([166.111.68.197]) by mx.google.com with ESMTPS id z1sm1085369wfd.15.2010.08.18.19.37.47 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 18 Aug 2010 19:37:50 -0700 (PDT) Message-ID: <4C6C98B4.4070909@gmail.com> Date: Thu, 19 Aug 2010 10:36:36 +0800 From: gnehzuil User-Agent: Thunderbird 2.0.0.24 (X11/20100317) MIME-Version: 1.0 To: fs@freebsd.org Content-Type: multipart/mixed; boundary="------------070305070503050500020300" Cc: jhb@FreeBSD.org Subject: [patch] ext2fs + preallocation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Aug 2010 03:03:01 -0000 This is a multi-part message in MIME format. --------------070305070503050500020300 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi all, There is a patch in attachment which implements a preallocation algorithm in ext2fs. I implement this algorithm during FreeBSD SoC 2010. This patch implements the in-memory ext2/3 block preallocation algorithm from reservation window. It uses a RB-tree to index block allocation request and reserve a number of blocks for each file which has requested to allocate a block. When a file request to allocate a block, it will find a block to allocate to this file. When it find the block to allocate, it will try to allocate a block, which is in the same cylinder group with inode and is not in other reservation window in RB-tree. Meanwhile there are some contiguous free blocks after this block. It uses a data structure to store this block's position and the length of contiguous free blocks. Then it inserts this data structure into RB-tree. When this file request to allocate a block again, It will find corresponding data structure in RB-tree. If it can find, the next free block will be allocated to this file directly. Otherwise, it will search a new block again. I have run some benchmarks to test this algorithm. Please review it in wiki page (' http://wiki.freebsd.org/SOC2010ZhengLiu'). The performance is better when the number of threads is smaller than 4. When the number of threads is greater than 4, the performance can be increased a little. Please test it. Thanks and best regards, lz --------------070305070503050500020300 Content-Type: text/x-patch; name="ext2fs_prealloc.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ext2fs_prealloc.patch" diff -urN /usr/src/sys/fs/ext2fs/ext2_alloc.c new/ext2_alloc.c --- /usr/src/sys/fs/ext2fs/ext2_alloc.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_alloc.c 2010-08-19 02:47:29.000000000 +0800 @@ -50,6 +50,9 @@ #include #include #include +#include + +#define phy_blk(cg, fs) (((cg) * (fs->e2fs->e2fs_fpg)) + fs->e2fs->e2fs_first_dblock) static daddr_t ext2_alloccg(struct inode *, int, daddr_t, int); static u_long ext2_dirpref(struct inode *); @@ -59,37 +62,524 @@ int)); static daddr_t ext2_nodealloccg(struct inode *, int, daddr_t, int); static daddr_t ext2_mapsearch(struct m_ext2fs *, char *, daddr_t); + +/* For reservation window */ +static u_long ext2_alloc_blk(struct inode *, int, struct buf *, int32_t, struct ext2_rsv_win *); +static int ext2_alloc_new_rsv(struct inode *, int, struct buf *, int32_t); +static int ext2_bpref_in_rsv(struct ext2_rsv_win *, int32_t); +static int ext2_find_rsv(struct ext2_rsv_win *, struct ext2_rsv_win *, + struct m_ext2fs *, int32_t, int); +static void ext2_remove_rsv_win(struct m_ext2fs *, struct ext2_rsv_win *); +static u_long ext2_rsvalloc(struct m_ext2fs *, struct inode *, + int, struct buf *, int32_t, int); +static daddr_t ext2_search_next_block(struct m_ext2fs *, char *, int, int); +static struct ext2_rsv_win *ext2_search_rsv(struct ext2_rsv_win_tree *, int32_t); + +RB_GENERATE(ext2_rsv_win_tree, ext2_rsv_win, rsv_link, ext2_rsv_win_cmp); + /* * Allocate a block in the file system. * - * A preference may be optionally specified. If a preference is given - * the following hierarchy is used to allocate a block: - * 1) allocate the requested block. - * 2) allocate a rotationally optimal block in the same cylinder. - * 3) allocate a block in the same cylinder group. - * 4) quadradically rehash into other cylinder groups, until an - * available block is located. - * If no block preference is given the following hierarchy is used - * to allocate a block: - * 1) allocate a block in the cylinder group that contains the - * inode for the file. - * 2) quadradically rehash into other cylinder groups, until an - * available block is located. - * - * A preference may be optionally specified. If a preference is given - * the following hierarchy is used to allocate a block: - * 1) allocate the requested block. - * 2) allocate a rotationally optimal block in the same cylinder. - * 3) allocate a block in the same cylinder group. - * 4) quadradically rehash into other cylinder groups, until an - * available block is located. - * If no block preference is given the following hierarchy is used - * to allocate a block: - * 1) allocate a block in the cylinder group that contains the - * inode for the file. - * 2) quadradically rehash into other cylinder groups, until an - * available block is located. + * By given preference: + * Check whether inode has a reservation window and preference + * is within it and try to allocate a free block from + * this reservation window. + * If not, traverse RB tree to find a place, which is not in + * any window and insert it to RB tree to try to allocate a + * free block again. + * If it fails, try to allocate a free block in other cylinder + * groups without preference. + */ + +/* + * Allocate a free block. + * + * First check whether reservation window is used. + * If reservation window is used, try to allocate a free + * block from the reservation window. If it fails, traverse + * the bitmap to find a free block. + * If reservation window is not used, try to allocate + * a free block by bpref. If it fails, traverse the bitmap + * to find a free block. */ +static u_long +ext2_alloc_blk(struct inode *ip, int cg, struct buf *bp, + int32_t bpref, struct ext2_rsv_win *rp) +{ + struct m_ext2fs *fs; + struct ext2mount *ump; + int bno, start, end; + char *bbp; + + fs = ip->i_e2fs; + ump = ip->i_ump; + bbp = (char *)bp->b_data; + + if (fs->e2fs_gd[cg].ext2bgd_nbfree == 0) + return (0); + + if (bpref < 0) + bpref = 0; + + /* Check whether it use reservation window */ + if (rp != NULL) { + /* + * If window's start is not in this cylinder group, + * try to allocate from the beginning, otherwise + * try to allocate from the beginning of the + * window. + */ + if (dtog(fs, rp->rsv_start) < cg) + start = 0; + else + start = rp->rsv_start; + + /* + * If window's end crosses the end of this group, + * set end variable to the end of this group. + * Otherwise, set it to the window's end. + */ + if (dtog(fs, rp->rsv_end) > cg) + end = phy_blk(cg + 1, fs) - 1; + else + end = rp->rsv_end; + + /* If preference block is within the window, try to allocate it. */ + if (start <= bpref && bpref <= end) { + bpref = dtogd(fs, bpref); + if (isclr(bbp, bpref)) { + rp->rsv_alloc_hit++; + bno = bpref; + goto gotit; + } + } else + if (dtog(fs, rp->rsv_start) == cg) + bpref = dtogd(fs, rp->rsv_start); + else + bpref = 0; + } else { + if (dtog(fs, bpref) != cg) + bpref = 0; + if (bpref != 0) { + bpref = dtogd(fs, bpref); + if (isclr(bbp, bpref)) { + bno = bpref; + goto gotit; + } + } + } + + bno = ext2_mapsearch(fs, bbp, bpref); + if (bno < 0) + return (0); + +gotit: + setbit(bbp, (daddr_t)bno); + EXT2_LOCK(ump); + fs->e2fs->e2fs_fbcount--; + fs->e2fs_gd[cg].ext2bgd_nbfree--; + fs->e2fs_fmod = 1; + EXT2_UNLOCK(ump); + bdwrite(bp); + bno = phy_blk(cg, fs) + bno; + return (bno); +} + +/* + * Initialize reservation window per inode. + */ +void +ext2_init_rsv(struct inode *ip) +{ + struct ext2_rsv_win *rp; + + rp = malloc(sizeof(struct ext2_rsv_win), + M_EXT2NODE, M_WAITOK | M_ZERO); + + /* + * If malloc failed, we just do not use the + * reservation window mechanism. + */ + if (rp == NULL) + return; + + rp->rsv_start = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_end = EXT2_RSV_NOT_ALLOCATED; + + rp->rsv_goal_size = EXT2_RSV_DEFAULT_RESERVE_BLKS; + rp->rsv_alloc_hit = 0; + + ip->i_rsv = rp; +} + +/* + * Discard reservation window. + * + * It is called during the following situations: + * 1. free an inode + * 2. sync inode + * 3. truncate a file + */ +void +ext2_discard_rsv(struct inode *ip) +{ + struct ext2_rsv_win *rp; + + if (ip->i_rsv == NULL) + return; + + rp = ip->i_rsv; + + /* If reservation window is empty, nothing to do */ + if (rp->rsv_end == EXT2_RSV_NOT_ALLOCATED) + return; + + EXT2_TREE_LOCK(ip->i_e2fs); + ext2_remove_rsv_win(ip->i_e2fs, rp); + EXT2_TREE_UNLOCK(ip->i_e2fs); + rp->rsv_goal_size = EXT2_RSV_DEFAULT_RESERVE_BLKS; +} + +/* + * Remove a ext2_rsv_win structure from RB tree. + */ +static void +ext2_remove_rsv_win(struct m_ext2fs *fs, struct ext2_rsv_win *rp) +{ + RB_REMOVE(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + rp->rsv_start = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_end = EXT2_RSV_NOT_ALLOCATED; + rp->rsv_alloc_hit = 0; +} + +/* + * Check bpref is in the reservation window. + */ +static int +ext2_bpref_in_rsv(struct ext2_rsv_win *rp, int32_t bpref) +{ + if (bpref >= 0 && (bpref < rp->rsv_start || bpref > rp->rsv_end)) + return (0); + + return (1); +} + +/* + * Search a tree node from RB tree. It includes the bpref or + * the previous one if bpref is not in any window. + */ +static struct ext2_rsv_win * +ext2_search_rsv(struct ext2_rsv_win_tree *root, int32_t start) +{ + struct ext2_rsv_win *prev, *next; + + if (RB_EMPTY(root)) + return (NULL); + + next = RB_ROOT(root); + do { + prev = next; + if (start < next->rsv_start) + next = RB_LEFT(next, rsv_link); + else if (start > next->rsv_end) + next = RB_RIGHT(next, rsv_link); + else + return (next); + } while (next != NULL); + + if (prev->rsv_start > start) { + next = RB_PREV(ext2_rsv_win_tree, root, prev); + if (next != NULL) + prev = next; + } + + return (prev); +} + +/* + * Find a reservation window by given range from start to + * the end of this cylinder group. + */ +static int +ext2_find_rsv(struct ext2_rsv_win *search, struct ext2_rsv_win *rp, + struct m_ext2fs *fs, int32_t start, int cg) +{ + struct ext2_rsv_win *rsv, *prev; + int32_t cur; + int size = rp->rsv_goal_size; + + if (search == NULL) { + rp->rsv_start = start & ~7; + rp->rsv_end = start + size - 1; + rp->rsv_alloc_hit = 0; + + RB_INSERT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + + return (0); + } + + /* + * Make the start of reservation window byte-aligned + * in order to can find a free block with bit operations + * in the ext2_search_next_block() function. + */ + cur = start & ~7; + rsv = search; + prev = NULL; + + while (1) { + if (cur <= rsv->rsv_end) + cur = rsv->rsv_end + 1; + + if (dtog(fs, cur) != cg) + return (-1); + + prev = rsv; + rsv = RB_NEXT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rsv); + + if (rsv == NULL) + break; + + if (cur + size <= rsv->rsv_start) + break; + } + + if (prev != rp && rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + + rp->rsv_start = cur; + rp->rsv_end = cur + size - 1; + rp->rsv_alloc_hit = 0; + + if (prev != rp) + RB_INSERT(ext2_rsv_win_tree, fs->e2fs_rsv_tree, rp); + + return (0); +} + +/* + * Find a free block by given range from bpref to + * the end of this cylinder group. + */ +static daddr_t +ext2_search_next_block(struct m_ext2fs *fs, char *bbp, int bpref, int cg) +{ + daddr_t bno; + int start, loc, len, map, i; + + start = bpref / NBBY; + len = howmany(fs->e2fs->e2fs_fpg, NBBY) - start; + loc = skpc(0xff, len, &bbp[start]); + if (loc == 0) + return (-1); + + i = start + len - loc; + map = bbp[i]; + bno = i * NBBY; + for (i = 1; i < (1 << NBBY); i <<= 1, bno++) { + if ((map & i) == 0) + return (bno); + } + + return (-1); +} + +/* + * Allocate a new reservation window. + */ +static int +ext2_alloc_new_rsv(struct inode *ip, int cg, struct buf *bp, int32_t bpref) +{ + struct m_ext2fs *fs; + struct ext2_rsv_win *rp, *search; + char *bbp; + int start, size, ret; + + fs = ip->i_e2fs; + rp = ip->i_rsv; + bbp = bp->b_data; + size = rp->rsv_goal_size; + + if (bpref <= 0) + start = phy_blk(cg, fs); + else + start = bpref; + + /* Dynamically increase the size of window */ + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) { + if (rp->rsv_alloc_hit > + ((rp->rsv_end - rp->rsv_start + 1) / 2)) { + size = size * 2; + if (size > EXT2_RSV_MAX_RESERVE_BLKS) + size = EXT2_RSV_MAX_RESERVE_BLKS; + rp->rsv_goal_size = size; + } + } + + EXT2_TREE_LOCK(fs); + + search = ext2_search_rsv(fs->e2fs_rsv_tree, start); + +repeat: + ret = ext2_find_rsv(search, rp, fs, start, cg); + if (ret < 0) { + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + EXT2_TREE_UNLOCK(fs); + return (-1); + } + EXT2_TREE_UNLOCK(fs); + + start = dtogd(fs, rp->rsv_start); + start = ext2_search_next_block(fs, bbp, start, cg); + if (start < 0) { + EXT2_TREE_LOCK(fs); + if (rp->rsv_end != EXT2_RSV_NOT_ALLOCATED) + ext2_remove_rsv_win(fs, rp); + EXT2_TREE_UNLOCK(fs); + return (-1); + } + + start = phy_blk(cg, fs) + start; + if (start >= rp->rsv_start && start <= rp->rsv_end) + return (0); + + search = rp; + EXT2_TREE_LOCK(fs); + goto repeat; +} + +/* + * Allocate a free block from reservation window. + */ +static u_long +ext2_rsvalloc(struct m_ext2fs *fs, struct inode *ip, int cg, + struct buf *bp, int32_t bpref, int size) +{ + struct ext2_rsv_win *rp; + int ret; + + rp = ip->i_rsv; + if (rp == NULL) + return (ext2_alloc_blk(ip, cg, bp, bpref, NULL)); + + if (rp->rsv_end == EXT2_RSV_NOT_ALLOCATED || + !ext2_bpref_in_rsv(rp, bpref)) { + ret = ext2_alloc_new_rsv(ip, cg, bp, bpref); + if (ret < 0) + return (0); + } + + return (ext2_alloc_blk(ip, cg, bp, bpref, rp)); +} + +/* + * Allocate a block using reservation window in ext2 file system. + * + * NOTE: This function will replace the ext2_alloc() function. + */ +int +ext2_alloc_rsv(struct inode *ip, int32_t lbn, int32_t bpref, + int size, struct ucred *cred, int32_t *bnp) +{ + struct m_ext2fs *fs; + struct ext2mount *ump; + struct buf *bp; + int32_t bno = 0; + int i, cg, error; + + *bnp = 0; + fs = ip->i_e2fs; + ump = ip->i_ump; + mtx_assert(EXT2_MTX(ump), MA_OWNED); + + if (size == fs->e2fs_bsize && fs->e2fs->e2fs_fbcount == 0) + goto nospace; + if (cred->cr_uid != 0 && + fs->e2fs->e2fs_fbcount < fs->e2fs->e2fs_rbcount) + goto nospace; + + if (bpref >= fs->e2fs->e2fs_bcount) + bpref = 0; + if (bpref == 0) + cg = ino_to_cg(fs, ip->i_number); + else + cg = dtog(fs, bpref); + + /* If cg has some free blocks, then try to allocate a free block from this cg */ + if (fs->e2fs_gd[cg].ext2bgd_nbfree > 0) { + /* Read block bitmap from buffer */ + EXT2_UNLOCK(ump); + error = bread(ip->i_devvp, + fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), + (int)fs->e2fs_bsize, NOCRED, &bp); + if (error) { + brelse(bp); + goto ioerror; + } + + EXT2_RSV_LOCK(ip); + /* Try to allocate from reservation window */ + bno = ext2_rsvalloc(fs, ip, cg, bp, bpref, size); + EXT2_RSV_UNLOCK(ip); + if (bno > 0) + goto allocated; + + brelse(bp); + EXT2_LOCK(ump); + } + + /* Just need to try to allocate a free block from rest groups. */ + cg = (cg + 1) % fs->e2fs_gcount; + for (i = 1; i < fs->e2fs_gcount; i++) { + if (fs->e2fs_gd[cg].ext2bgd_nbfree > 0) { + /* Read block bitmap from buffer */ + EXT2_UNLOCK(ump); + error = bread(ip->i_devvp, + fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), + (int)fs->e2fs_bsize, NOCRED, &bp); + if (error) { + brelse(bp); + goto ioerror; + } + + EXT2_RSV_LOCK(ip); + bno = ext2_rsvalloc(fs, ip, cg, bp, -1, size); + EXT2_RSV_UNLOCK(ip); + if (bno > 0) + goto allocated; + + brelse(bp); + EXT2_LOCK(ump); + } + + cg++; + if (cg == fs->e2fs_gcount) + cg = 0; + } + +allocated: + if (bno > 0) { + ip->i_next_alloc_block = lbn; + ip->i_next_alloc_goal = bno; + + ip->i_blocks += btodb(fs->e2fs_bsize); + ip->i_flag |= IN_CHANGE | IN_UPDATE; + *bnp = bno; + return (0); + } + +nospace: + EXT2_UNLOCK(ump); + ext2_fserr(fs, cred->cr_uid, "file system full"); + uprintf("\n%s: write failed, file system is full\n", fs->e2fs_fsmnt); + return (ENOSPC); + +ioerror: + ext2_fserr(fs, cred->cr_uid, "file system IO error"); + uprintf("\n%s: write failed, file system IO error\n", fs->e2fs_fsmnt); + return (EIO); +} int ext2_alloc(ip, lbn, bpref, size, cred, bnp) @@ -923,9 +1413,11 @@ start = 0; loc = skpc(0xff, len, &bbp[start]); if (loc == 0) { - printf("start = %d, len = %d, fs = %s\n", - start, len, fs->e2fs_fsmnt); - panic("ext2fs_alloccg: map corrupted"); + /* XXX: just for reservation window */ + return -1; + /*printf("start = %d, len = %d, fs = %s\n",*/ + /*start, len, fs->e2fs_fsmnt);*/ + /*panic("ext2fs_alloccg: map corrupted");*/ /* NOTREACHED */ } } diff -urN /usr/src/sys/fs/ext2fs/ext2_balloc.c new/ext2_balloc.c --- /usr/src/sys/fs/ext2fs/ext2_balloc.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_balloc.c 2010-08-19 02:47:29.000000000 +0800 @@ -49,6 +49,7 @@ #include #include #include +#include /* * Balloc defines the structure of file system storage * by allocating the physical blocks on a device given @@ -78,6 +79,9 @@ fs = ip->i_e2fs; ump = ip->i_ump; + if (ip->i_rsv == NULL) + ext2_init_rsv(ip); + /* * check if this is a sequential block allocation. * If so, increment next_alloc fields to allow ext2_blkpref @@ -136,9 +140,9 @@ else nsize = fs->e2fs_bsize; EXT2_LOCK(ump); - error = ext2_alloc(ip, lbn, - ext2_blkpref(ip, lbn, (int)lbn, &ip->i_db[0], 0), - nsize, cred, &newb); + error = ext2_alloc_rsv(ip, lbn, + ext2_blkpref(ip, lbn, (int)lbn, &ip->i_db[0], 0), + nsize, cred, &newb); if (error) return (error); bp = getblk(vp, lbn, nsize, 0, 0, 0); @@ -170,9 +174,9 @@ EXT2_LOCK(ump); pref = ext2_blkpref(ip, lbn, indirs[0].in_off + EXT2_NDIR_BLOCKS, &ip->i_db[0], 0); - if ((error = ext2_alloc(ip, lbn, pref, - (int)fs->e2fs_bsize, cred, &newb))) - return (error); + if ((error = ext2_alloc_rsv(ip, lbn, pref, + (int)fs->e2fs_bsize, cred, &newb))) + return (error); nb = newb; bp = getblk(vp, indirs[1].in_lbn, fs->e2fs_bsize, 0, 0, 0); bp->b_blkno = fsbtodb(fs, newb); @@ -211,7 +215,7 @@ if (pref == 0) pref = ext2_blkpref(ip, lbn, indirs[i].in_off, bap, bp->b_lblkno); - error = ext2_alloc(ip, lbn, pref, (int)fs->e2fs_bsize, cred, &newb); + error = ext2_alloc_rsv(ip, lbn, pref, (int)fs->e2fs_bsize, cred, &newb); if (error) { brelse(bp); return (error); @@ -250,8 +254,8 @@ EXT2_LOCK(ump); pref = ext2_blkpref(ip, lbn, indirs[i].in_off, &bap[0], bp->b_lblkno); - if ((error = ext2_alloc(ip, - lbn, pref, (int)fs->e2fs_bsize, cred, &newb)) != 0) { + if ((error = ext2_alloc_rsv(ip, lbn, pref, + (int)fs->e2fs_bsize, cred, &newb)) != 0) { brelse(bp); return (error); } diff -urN /usr/src/sys/fs/ext2fs/ext2_inode.c new/ext2_inode.c --- /usr/src/sys/fs/ext2fs/ext2_inode.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_inode.c 2010-08-19 02:47:29.000000000 +0800 @@ -52,6 +52,7 @@ #include #include #include +#include static int ext2_indirtrunc(struct inode *, int32_t, int32_t, int32_t, int, long *); @@ -153,6 +154,11 @@ } fs = oip->i_e2fs; osize = oip->i_size; + + EXT2_RSV_LOCK(oip); + ext2_discard_rsv(oip); + EXT2_RSV_UNLOCK(oip); + /* * Lengthen the size of the file. We must ensure that the * last byte of the file is allocated. Since the smallest @@ -484,6 +490,10 @@ if (prtactive && vrefcnt(vp) != 0) vprint("ext2_inactive: pushing active", vp); + EXT2_RSV_LOCK(ip); + ext2_discard_rsv(ip); + EXT2_RSV_UNLOCK(ip); + /* * Ignore inodes related to stale file handles. */ @@ -525,11 +535,21 @@ if (prtactive && vrefcnt(vp) != 0) vprint("ufs_reclaim: pushing active", vp); ip = VTOI(vp); + if (ip->i_flag & IN_LAZYMOD) { ip->i_flag |= IN_MODIFIED; ext2_update(vp, 0); } vfs_hash_remove(vp); + + EXT2_RSV_LOCK(ip); + if (ip->i_rsv != NULL) { + free(ip->i_rsv, M_EXT2NODE); + ip->i_rsv = NULL; + } + EXT2_RSV_UNLOCK(ip); + mtx_destroy(&ip->i_rsv_lock); + free(vp->v_data, M_EXT2NODE); vp->v_data = 0; vnode_destroy_vobject(vp); diff -urN /usr/src/sys/fs/ext2fs/ext2_rsv_win.h new/ext2_rsv_win.h --- /usr/src/sys/fs/ext2fs/ext2_rsv_win.h 1970-01-01 08:00:00.000000000 +0800 +++ new/ext2_rsv_win.h 2010-08-19 02:47:29.000000000 +0800 @@ -0,0 +1,78 @@ +/*- + * Copyright (c) 2010, 2010 Zheng Liu + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD: src/sys/fs/ext2fs/ext2_rsv_win.h,v 0.1 2010/05/08 12:41:51 lz Exp $ + */ +#ifndef _FS_EXT2FS_EXT2_RSV_WIN_H_ +#define _FS_EXT2FS_EXT2_RSV_WIN_H_ + +#include + +#define EXT2_RSV_DEFAULT_RESERVE_BLKS 8 +#define EXT2_RSV_MAX_RESERVE_BLKS 1024 +#define EXT2_RSV_NOT_ALLOCATED 0 + +#define EXT2_RSV_LOCK(ip) mtx_lock(&ip->i_rsv_lock) +#define EXT2_RSV_UNLOCK(ip) mtx_unlock(&ip->i_rsv_lock) + +#define EXT2_TREE_LOCK(fs) mtx_lock(&fs->e2fs_rsv_lock); +#define EXT2_TREE_UNLOCK(fs) mtx_unlock(&fs->e2fs_rsv_lock); + +/* + * Reservation window entry + */ +struct ext2_rsv_win { + RB_ENTRY(ext2_rsv_win) rsv_link; /* RB tree links */ + + int32_t rsv_goal_size; /* Default reservation window size */ + int32_t rsv_alloc_hit; /* Number of allocated windows */ + + int32_t rsv_start; /* First bytes of window */ + int32_t rsv_end; /* End bytes of window */ +}; + +RB_HEAD(ext2_rsv_win_tree, ext2_rsv_win); + +static __inline int +ext2_rsv_win_cmp(const struct ext2_rsv_win *a, + const struct ext2_rsv_win *b) +{ + if (a->rsv_start < b->rsv_start) + return (-1); + if (a->rsv_start == b->rsv_start) + return (0); + + return (1); +} +RB_PROTOTYPE(ext2_rsv_win_tree, ext2_rsv_win, rsv_link, ext2_rsv_win_cmp); + +/* predefine */ +struct inode; +/* ext2_alloc.c */ +void ext2_init_rsv(struct inode *ip); +void ext2_discard_rsv(struct inode *ip); +int ext2_alloc_rsv(struct inode *, int32_t, int32_t, int, struct ucred *, int32_t *); + +#endif /* !_FS_EXT2FS_EXT2_RSV_WIN_H_ */ diff -urN /usr/src/sys/fs/ext2fs/ext2_vfsops.c new/ext2_vfsops.c --- /usr/src/sys/fs/ext2fs/ext2_vfsops.c 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2_vfsops.c 2010-08-19 02:47:29.000000000 +0800 @@ -1,4 +1,4 @@ -/*- +/* * modified for EXT2FS support in Lites 1.1 * * Aug 1995, Godmar Back (gback@cs.utah.edu) @@ -61,6 +61,7 @@ #include #include #include +#include static int ext2_flushfiles(struct mount *mp, int flags, struct thread *td); static int ext2_mountfs(struct vnode *, struct mount *); @@ -95,9 +96,9 @@ static int compute_sb_data(struct vnode * devvp, struct ext2fs * es, struct m_ext2fs * fs); -static const char *ext2_opts[] = { "from", "export", "acls", "noexec", - "noatime", "union", "suiddir", "multilabel", "nosymfollow", - "noclusterr", "noclusterw", "force", NULL }; +static const char *ext2_opts[] = { "acls", "async", "export", "force", + "from", "multilabel", "noatime", "noclusterr", "noclusterw", + "noexec", "nosymfollow", "suiddir", "union", NULL }; /* * VFS Operations. @@ -581,6 +582,14 @@ if ((error = compute_sb_data(devvp, ump->um_e2fs->e2fs, ump->um_e2fs))) goto out; + /* Initial reservation window index and lock */ + bzero(&ump->um_e2fs->e2fs_rsv_lock, sizeof(struct mtx)); + mtx_init(&ump->um_e2fs->e2fs_rsv_lock, + "rsv tree lock", NULL, MTX_DEF); + ump->um_e2fs->e2fs_rsv_tree = malloc(sizeof(struct ext2_rsv_win_tree), + M_EXT2MNT, M_WAITOK | M_ZERO); + RB_INIT(ump->um_e2fs->e2fs_rsv_tree); + brelse(bp); bp = NULL; fs = ump->um_e2fs; @@ -680,6 +689,8 @@ g_topology_unlock(); PICKUP_GIANT(); vrele(ump->um_devvp); + free(fs->e2fs_rsv_tree, M_EXT2MNT); + mtx_destroy(&fs->e2fs_rsv_lock); free(fs->e2fs_gd, M_EXT2MNT); free(fs->e2fs_contigdirs, M_EXT2MNT); free(fs->e2fs, M_EXT2MNT); @@ -919,6 +930,10 @@ ip->i_prealloc_count = 0; ip->i_prealloc_block = 0; + bzero(&ip->i_rsv_lock, sizeof(struct mtx)); + mtx_init(&ip->i_rsv_lock, "inode rsv lock", NULL, MTX_DEF); + ip->i_rsv = NULL; + /* * Now we want to make sure that block pointers for unused * blocks are zeroed out - ext2_balloc depends on this diff -urN /usr/src/sys/fs/ext2fs/ext2fs.h new/ext2fs.h --- /usr/src/sys/fs/ext2fs/ext2fs.h 2010-01-14 22:30:54.000000000 +0800 +++ new/ext2fs.h 2010-08-19 02:47:29.000000000 +0800 @@ -38,6 +38,7 @@ #define _FS_EXT2FS_EXT2_FS_H #include +#include /* * Special inode numbers @@ -174,6 +175,9 @@ char e2fs_wasvalid; /* valid at mount time */ off_t e2fs_maxfilesize; struct ext2_gd *e2fs_gd; /* Group Descriptors */ + + struct mtx e2fs_rsv_lock; /* Protect reservation window RB tree */ + struct ext2_rsv_win_tree *e2fs_rsv_tree; /* Reservation window index */ }; /* diff -urN /usr/src/sys/fs/ext2fs/inode.h new/inode.h --- /usr/src/sys/fs/ext2fs/inode.h 2010-01-14 22:30:54.000000000 +0800 +++ new/inode.h 2010-08-19 02:47:29.000000000 +0800 @@ -100,6 +100,10 @@ int32_t i_gen; /* Generation number. */ u_int32_t i_uid; /* File owner. */ u_int32_t i_gid; /* File group. */ + + /* Fields for reservation window */ + struct mtx i_rsv_lock; /* Protects i_rsv */ + struct ext2_rsv_win *i_rsv; /* Reservation window */ }; /* --------------070305070503050500020300-- From owner-freebsd-fs@FreeBSD.ORG Fri Aug 20 17:03:30 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 079C710656A3; Fri, 20 Aug 2010 17:03:30 +0000 (UTC) (envelope-from ed@hoeg.nl) Received: from mx0.hoeg.nl (unknown [IPv6:2a01:4f8:101:5343::aa]) by mx1.freebsd.org (Postfix) with ESMTP id 996C58FC12; Fri, 20 Aug 2010 17:03:29 +0000 (UTC) Received: by mx0.hoeg.nl (Postfix, from userid 1000) id B87BA2A28CF3; Fri, 20 Aug 2010 19:03:27 +0200 (CEST) Date: Fri, 20 Aug 2010 19:03:27 +0200 From: Ed Schouten To: Daichi GOTO Message-ID: <20100820170327.GL2978@hoeg.nl> References: <4C6B9F51.1060009@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="cW0eHRJ76X8TDo3d" Content-Disposition: inline In-Reply-To: <4C6B9F51.1060009@freebsd.org> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Whiteout support for tmpfs [Was: unionfs a little improvement] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Aug 2010 17:03:30 -0000 --cW0eHRJ76X8TDo3d Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi all, Even though the proposed fix for unionfs would still be nice to have in SVN, I just wrote a patch for tmpfs to add support for whiteouts: http://80386.nl/pub/tmpfs-whiteout.txt Basically I've implemented it by allowing directory entries to refer to NULL inodes, to indicate the entry is a whiteout. I think the patch should work pretty well, but what I dislike about it, is that when it removes a file and replaces it by a whiteout, it deallocates the entire directory entry, followed by the allocation of a new directory entry for the whiteout. This could be done more efficiently, but the problem is that it turns the code into a mess. Any comments? --=20 Ed Schouten WWW: http://80386.nl/ --cW0eHRJ76X8TDo3d Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD) iEYEARECAAYFAkxutV8ACgkQ52SDGA2eCwWV2wCdFgiJwRp3LqMCyav+AJfLHg6b /cIAn2MtKM/AEmxHH7mW7WpdU8dKPMdT =UF2+ -----END PGP SIGNATURE----- --cW0eHRJ76X8TDo3d-- From owner-freebsd-fs@FreeBSD.ORG Fri Aug 20 23:27:21 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9A93A10656A8; Fri, 20 Aug 2010 23:27:21 +0000 (UTC) (envelope-from arundel@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8B0D18FC18; Fri, 20 Aug 2010 23:27:21 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o7KNRLrD070238; Fri, 20 Aug 2010 23:27:21 GMT (envelope-from arundel@freefall.freebsd.org) Received: (from arundel@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o7KNRK2s070234; Fri, 20 Aug 2010 23:27:20 GMT (envelope-from arundel) Date: Fri, 20 Aug 2010 23:27:20 GMT Message-Id: <201008202327.o7KNRK2s070234@freefall.freebsd.org> To: hsn@netmag.cz, arundel@FreeBSD.org, mckusick@FreeBSD.org, freebsd-fs@FreeBSD.org From: arundel@FreeBSD.org Cc: Subject: Re: docs/61716: newfs(8) code and manpage are out of sync X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Aug 2010 23:27:21 -0000 Synopsis: newfs(8) code and manpage are out of sync State-Changed-From-To: open->analyzed State-Changed-By: arundel State-Changed-When: Fri Aug 20 23:13:12 UTC 2010 State-Changed-Why: The originator of the PR described the problem very accurately. However the location where the default max-extent-size gets set has moved from /usr/src/sbin/newfs/newfs.c to usr.sbin/newfs/mkfs.c: 224 if (maxbsize == 0) 225 maxbsize = bsize; 226 if (maxbsize < bsize || !POWEROF2(maxbsize)) { 227 sblock.fs_maxbsize = sblock.fs_bsize; 228 printf("Extent size set to %d\n", sblock.fs_maxbsize); 229 } else if (sblock.fs_maxbsize > FS_MAXCONTIG * sblock.fs_bsize) { 230 sblock.fs_maxbsize = FS_MAXCONTIG * sblock.fs_bsize; 231 printf("Extent size reduced to %d\n", sblock.fs_maxbsize); 232 } else { 233 sblock.fs_maxbsize = maxbsize; 234 } I'll submit a patch for newfs(8) right away. Responsible-Changed-From-To: mckusick->freebsd-fs Responsible-Changed-By: arundel Responsible-Changed-When: Fri Aug 20 23:13:12 UTC 2010 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=61716 From owner-freebsd-fs@FreeBSD.ORG Fri Aug 20 23:40:03 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 846CC1065695 for ; Fri, 20 Aug 2010 23:40:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 73DB88FC08 for ; Fri, 20 Aug 2010 23:40:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o7KNe3cw080756 for ; Fri, 20 Aug 2010 23:40:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o7KNe3ur080755; Fri, 20 Aug 2010 23:40:03 GMT (envelope-from gnats) Date: Fri, 20 Aug 2010 23:40:03 GMT Message-Id: <201008202340.o7KNe3ur080755@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Alexander Best Cc: Subject: Re: docs/61716: newfs(8) code and manpage are out of sync X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Alexander Best List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Aug 2010 23:40:03 -0000 The following reply was made to PR docs/61716; it has been noted by GNATS. From: Alexander Best To: bug-followup@freebsd.org Cc: Subject: Re: docs/61716: newfs(8) code and manpage are out of sync Date: Fri, 20 Aug 2010 23:39:38 +0000 --opJtzjQTFsWo+cga Content-Type: text/plain; charset=us-ascii Content-Disposition: inline This patch should sync the newfs code with newfs(8). cheers. alex -- a13x --opJtzjQTFsWo+cga Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="newfs.8.diff" Index: /usr/src/sbin/newfs/newfs.8 =================================================================== --- /usr/src/sbin/newfs/newfs.8 (revision 211393) +++ /usr/src/sbin/newfs/newfs.8 (working copy) @@ -125,8 +125,9 @@ .It Fl d Ar max-extent-size The file system may choose to store large files using extents. This parameter specifies the largest extent size that may be used. -It is presently limited to its default value which is 16 times -the file system blocksize. +The default value is the file system blocksize. +It is presently limited to a maximum value of 16 times the +file system blocksize and a minimum value of the file system blocksize. .It Fl e Ar maxbpg Indicate the maximum number of blocks any single file can allocate out of a cylinder group before it is forced to begin --opJtzjQTFsWo+cga-- From owner-freebsd-fs@FreeBSD.ORG Sat Aug 21 11:06:24 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0285A1065698; Sat, 21 Aug 2010 11:06:24 +0000 (UTC) (envelope-from arundel@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id CFB088FC15; Sat, 21 Aug 2010 11:06:23 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o7LB6N6E087617; Sat, 21 Aug 2010 11:06:23 GMT (envelope-from arundel@freefall.freebsd.org) Received: (from arundel@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o7LB6NRK087613; Sat, 21 Aug 2010 11:06:23 GMT (envelope-from arundel) Date: Sat, 21 Aug 2010 11:06:23 GMT Message-Id: <201008211106.o7LB6NRK087613@freefall.freebsd.org> To: arundel@FreeBSD.org, freebsd-fs@FreeBSD.org, freebsd-fs@FreeBSD.org From: arundel@FreeBSD.org Cc: Subject: Re: docs/61716: [patch] newfs(8) code and manpage are out of sync X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Aug 2010 11:06:24 -0000 Synopsis: [patch] newfs(8) code and manpage are out of sync Class-Changed-From-To: sw-bug->doc-bug Class-Changed-By: arundel Class-Changed-When: Sat Aug 21 11:05:10 UTC 2010 Class-Changed-Why: This is not a sw-bug. http://www.freebsd.org/cgi/query-pr.cgi?pr=61716 From owner-freebsd-fs@FreeBSD.ORG Sat Aug 21 14:20:05 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A6E01065674 for ; Sat, 21 Aug 2010 14:20:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 20FBC8FC08 for ; Sat, 21 Aug 2010 14:20:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o7LEK5SG078218 for ; Sat, 21 Aug 2010 14:20:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o7LEK4BT078217; Sat, 21 Aug 2010 14:20:05 GMT (envelope-from gnats) Date: Sat, 21 Aug 2010 14:20:05 GMT Message-Id: <201008211420.o7LEK4BT078217@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: "Aldis Berjoza" Cc: Subject: Re: kern/137037: [zfs] [hang] zfs rollback on root causes FreeBSD to freeze in few seconds X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Aldis Berjoza List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Aug 2010 14:20:05 -0000 The following reply was made to PR kern/137037; it has been noted by GNATS. From: "Aldis Berjoza" To: bug-followup@freebsd.org, killasmurf86@gmail.com Cc: Subject: Re: kern/137037: [zfs] [hang] zfs rollback on root causes FreeBSD to freeze in few seconds Date: Sat, 21 Aug 2010 17:19:19 +0300 On FreeBSD 8.1 everything works fine now :D -- Aldis Berjoza